#130273: Linux mountpoint issues

# Wed Oct 12 19:00:52 2011 Andrew Deason - Ticket created

Subject:	Linux panic on recursive mtpt rmdir
Date:	Wed, 12 Oct 2011 18:04:21 -0500
To:	security@openafs.org
From:	Andrew Deason <adeason@sinenomine.net>

Download (untitled)
text/plain 2.5k

I am submitting this to the security queue, because it allows at least
anyone with 'a' privs anywhere in AFS to panic the client. I think this
is representative of the "we're in the kernel, so almost any bug is a
local DoS" notion, so I don't know how much of a "security" issue this
should be in general. But since it's so trivial to execute... I'm just
being safe.

Doing the following appears to always cause a BUG on Linux (so, I think
depending on the config, causes a panic) if 'foo' is a volume:

fs mkm foo foo
cd foo
fs mkm foo foo
rmdir foo

This happens on at least RHEL5 (2.6.18) OpenAFS 1.4.14, and master; I
assume virtually all versions of either are applicable, since I don't
see much variation in the relevant code paths. The BUG is in may_delete:

BUG_ON(victim->d_parent->d_inode != dir);

This should not be surprising, as this should be rather familiar:
https://lists.openafs.org/pipermail/openafs-devel/2005-December/013334.html

I believe the reason is that, while afs_linux_lookup tries to invalidate
the alias dentry when we look up a directory, we cannot invalidate a
dentry that is in use. And when we rmdir, we look up the "parent" mount
point, grab a ref, and then lookup the "child". So when we look up the
child, the parent dentry is the one associated with the inode, and we
can't invalidate it because it is in use by the caller. So we return the
"parent" dentry, which of course has a different d_parent (it's one
level higher) and so that bug triggers.

I am not entirely sure what can be done about this. The naive workaround
is just to add the duplicate dentry anyway when d_invalidate fails,
resulting in (at least) two dentries for the vol root inode. As far as I
know this is against the "rules", but it does not seem to immediately
panic the box or anything. So, one possibly terrible idea is to do that
for the lookup, but get something else (another background daemon or
whatever) to try and remove the duplicate dentries as quickly as
possible immediately afterwards.

Or we can just add multiple dentries on the inode, and just give our
best effort to try and reduce how many we have. In that linked thread,
chas says we used to not care how many dentries we had on the inode. Any
idea how that causes problems specifically? (I mean, I know _why_
conceptually Linux doesn't like it, but I'm not sure what specifically
will explode if we do) And even if doing that results in a panic or
something sometimes... well, that's better than the 100% of the time
that the current situation is.

--
Andrew Deason
adeason@sinenomine.net

# Fri Apr 13 17:54:18 2012 Andrew Deason - Correspondence added

Subject:	Re: [rt.central.org #130273] Linux panic on recursive mtpt rmdir
Date:	Fri, 13 Apr 2012 16:56:58 -0500
To:	"openafs-security@openafs.org" <openafs-security@openafs.org>
From:	Andrew Deason <adeason@sinenomine.net>

Download (untitled)
text/plain 2.3k

Attached is a possibly naive workaround. This avoids the panics in the
use cases I am aware of, but it may cause panics for other cases (such
as Russ' use case which was the reason for the original patch in 2005 by
chas). In addition, it has the issue that it makes "immediately
recursive" mountpoints unusable (that is, mountpoints like 'fs mkm foo
vol ; fs mkm foo/foo vol"). As mentioned in the commit message, I do not
see how to get around that without drastic changes.

I would like to ask at least Russ or chas to look at this; I'm not
exactly sure what specific operations were causing the panics in 2005,
so if they could test or explain what they were...

Anyway, if this turns out to still be an issue, I can think of a few
other approaches to address this problem:

- Instead of returning the dcache/inode for the volume root dir when we
lookup a mntpt, return some pseudo-vcache that has different vnode
ops structures. When we get various vnop requests on it, just pass on
the request to the 'real' root dir.

I can see how this may solve a problem or two with say, recursive
mountpoints, but I assume this doesn't solve the general problem
since we still eventually have the problem of "one directory one
parent".

- Actually have two mountpoints appear as two completely separate trees
(that is, we actually make two separate vcaches for the same root
fid). This is obviously not efficient, but I think it pretty much
does make the entire problem go away.

- We do the "right thing" and make mountpoints actual mounts in the VFS
layer. We cannot use the necessary "kernel mount" stuff since it's
GPL-only, but I think we can call the much higher-level do_mount()
(or if we need to, sys_mount() or something). If nothing else, even
in the face of future symbol GPL-ness changes or whatever, we could
always theoretically make a userspace upcall that just calls mount().

It may be simpler to just make bind mounts from foo/mtpt to
/afs/.vol:1234 (or whatever that magical path syntax was), so we
don't have to come up with the new per-fs metadata and whatever for
the new mount. I'm not sure if that's really necessary, but it's an
idea.

I think the time may have come for us to finally do something like
this, and I can expend the work to do it if this sounds like the way
we should go.

--
Andrew Deason
adeason@sinenomine.net

Download gco130273-linux-mntpt-dentries.diff.gz
application/octet-stream 2k

Message body not shown because it is not plain text.

# Fri Feb 14 03:00:24 2014 Jeffrey Altman - Queue changed from to openafs-bugs

# Fri Feb 14 03:00:25 2014 Jeffrey Altman - Given to adeason

# Wed Feb 26 13:54:34 2014 Andrew Deason - Correspondence added

Download (untitled)
text/plain 1.1k

I'm going to be pointing some people at this ticket for more information, so I should start
including some more details in here. The ticket traffic above this was for the initial
manifestation of this issue, which was solved in OpenAFS 1.6.2 (commit
dda3ea5f9ddda389955249e17a2e97b2e5ce7f1c).

With RHEL 5.10 and 6.5, our workaround broke and the issue is generally back (d_splice_alias
in some situations will not let us add a dir alias at all, and even if it does it logs a WARN). It was
introduced in versions 2.6.18-367.el5 and 2.6.32-408.el6 (which are between 5.9-5.10 and
6.4-6.5, I believe). Last time we worked around the issue by making the dir inodes act like
redirecting symlinks and using d_automount where available. But now it is likely we will have
to implement a completely new approach, like one of the ones described in the thread here
<https://lists.openafs.org/pipermail/openafs-devel/2012-June/018886.html>.

Currently I'm looking at implementing the "bind-mount" method of solving this, but I'll be
raising it on the list(s) to see if there are opinions on any other approach.

--
Andrew Deason
adeason@sinenomine.net

# Wed Feb 26 13:54:34 2014 Andrew Deason - Status changed from 'new' to 'open'

# Wed Feb 26 13:58:34 2014 Andrew Deason - Subject changed from 'Linux panic on recursive mtpt rmdir' to 'Linux mountpoint issues'

RT Self Service / #130273: Linux mountpoint issues