Skip Menu | Logged in as guest | Logout
 
The Basics
Id: 131725
Status: open
Priority: 0/
Queue: openafs-bugs

Dates
Created: Thu Sep 05 13:54:16 2013
Starts: Not set
Started: Not set
Last Contact: Mon Oct 07 15:48:29 2013
Due: Not set
Closed: Not set
Updated: Mon Oct 07 15:48:29 2013 by adeason

History Brief headersFull headers
Subject: rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7
Date: Thu, 5 Sep 2013 17:54:06 +0000
To: "openafs-bugs@openafs.org" <openafs-bugs@openafs.org>
From: "Biswas, Brian P" <bbiswas@email.unc.edu>
Download (untitled)
text/plain 1.4k
rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) from openafs.org doesn't work
on AIX 6.1 technology level 7.

The AFS kernel loads, but afsd get stucks when trying
to mount root.afs.

Here are the debug messages when running afsd with the -debug option:

doSweepAFSCache: Current directory entry:
inode=34061, reclen=20, name='V31997'
doSweepAFSCache: Current directory entry:
inode=34062, reclen=20, name='V31998'
doSweepAFSCache: Current directory entry:
inode=34063, reclen=20, name='V31999'
doSweepAFSCache: Closing cache directory.
doSweepAFSCache: Closing cache directory.
afsd: 32000 out of 32000 data cache files found in sweep 1.
afsd: Calling AFSOP_CACHEINFO: dcache file is '/usr/vice/cache/CacheItems'
afsd: Calling AFSOP_CELLINFO: cell info file is '/usr/vice/cache/CellItems'
afsd: Forking AFS daemon.
afsd: Forking Check Server Daemon.
afsd: Forking 6 background daemons.
afsd: Calling AFSOP_VOLUMEINFO: volume info file is '/usr/vice/cache/VolumeItems'
afsd: Calling AFSOP_CACHEFILE for each of the 32000 files in '/usr/vice/cache'
afsd: Calling AFSOP_GO with cacheSetTime = 0
afsd: All AFS daemons started.
afsd: Forking trunc-cache daemon.
afsd: Mounting the AFS root on '/afs', flags: 0.


I've also built openafs 1.6.5 from source on technology level 7,
but see the same behavior.
Subject: Re: [rt.central.org #131725] AutoReply: rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7
Date: Tue, 10 Sep 2013 14:40:27 +0000
To: "<openafs-bugs@openafs.org>" <openafs-bugs@openafs.org>
From: "Biswas, Brian P" <bbiswas@email.unc.edu>
Download (untitled)
text/plain 2.5k
Further debugging shows that afsd is hung up when in the routine
aix_vmount (file openafs-1.6.5/src/afsd/afsd_kernel.c):

/* Do the actual mount system call */

error = vmount(vmountp, size);

i.e. it never returns from the above vmount call.

--Brian




On Sep 5, 2013, at 1:54 PM, OpenAFS Bug Reports via RT wrote:

>
> Greetings,
>
> This message has been automatically generated in response to the
> creation of a trouble ticket regarding:
> "rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7",
> a summary of which appears below.
>
> There is no need to reply to this message right now. Your ticket has been
> assigned an ID of [rt.central.org #131725].
>
> Please include the string:
>
> [rt.central.org #131725]
>
> in the subject line of all future correspondence about this issue. To do so,
> you may reply to this message.
>
> Thank you,
> openafs-bugs@openafs.org
>
> -------------------------------------------------------------------------
> rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) from openafs.org doesn't work
> on AIX 6.1 technology level 7.
>
> The AFS kernel loads, but afsd get stucks when trying
> to mount root.afs.
>
> Here are the debug messages when running afsd with the -debug option:
>
> doSweepAFSCache: Current directory entry:
> inode=34061, reclen=20, name='V31997'
> doSweepAFSCache: Current directory entry:
> inode=34062, reclen=20, name='V31998'
> doSweepAFSCache: Current directory entry:
> inode=34063, reclen=20, name='V31999'
> doSweepAFSCache: Closing cache directory.
> doSweepAFSCache: Closing cache directory.
> afsd: 32000 out of 32000 data cache files found in sweep 1.
> afsd: Calling AFSOP_CACHEINFO: dcache file is '/usr/vice/cache/CacheItems'
> afsd: Calling AFSOP_CELLINFO: cell info file is '/usr/vice/cache/CellItems'
> afsd: Forking AFS daemon.
> afsd: Forking Check Server Daemon.
> afsd: Forking 6 background daemons.
> afsd: Calling AFSOP_VOLUMEINFO: volume info file is '/usr/vice/cache/VolumeItems'
> afsd: Calling AFSOP_CACHEFILE for each of the 32000 files in '/usr/vice/cache'
> afsd: Calling AFSOP_GO with cacheSetTime = 0
> afsd: All AFS daemons started.
> afsd: Forking trunc-cache daemon.
> afsd: Mounting the AFS root on '/afs', flags: 0.
>
>
> I've also built openafs 1.6.5 from source on technology level 7,
> but see the same behavior.
>
Download (untitled)
text/plain 1.4k
On Tue Sep 10 10:41:55 2013, bbiswas@email.unc.edu wrote:
> Further debugging shows that afsd is hung up when in the routine
> aix_vmount (file openafs-1.6.5/src/afsd/afsd_kernel.c):
>
> /* Do the actual mount system call */
>
> error = vmount(vmountp, size);
>
> i.e. it never returns from the above vmount call.

Unless someone just happens to know what we need to do to fix this, you (or someone) will
probably need to get information from the kernel debugger. And possibly someone will need
to spend some more time on an AIX 6.1 TL7 machine (or similar) with live debugging. I don't
have access to any such machine, though I may in the future; you can either run some
commands yourself via instructions in this ticket, or provide me with access to such a
machine, or wait for someone else to investigate it (Derrick, or someone else with access to a
relevant AIX machine).

Anyway, to start off, we need at least a stack trace to know what's going on. This may or may
not work (without a machine I am guessing at quite a bit), but something like the following
may help:

# kdb
(0)> p * | grep afsd
pvproc+007800 30 afsd ACTIVE 001E0AA 002208A 0000000016343400 0 0001
(0)> stack 30
pvthread+001E00 STACK:
[stack info]
(0)>

The '30' in "stack 30" comes from the second column in the "p *" output; run that for every line
you find, and provide the stack info that's output.

--
Andrew Deason
adeason@sinenomine.net
Subject: Re: [rt.central.org #131725] AutoReply: rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7
Date: Wed, 18 Sep 2013 18:08:26 +0000
To: "openafs-bugs@openafs.org" <openafs-bugs@openafs.org>
From: "Biswas, Brian P" <bbiswas@email.unc.edu>
Download (untitled)
text/plain 7.2k

Let us know if you need access to our aix 6.1 tl 7 machine for debugging.

Here is the requested trace:


# kdb
START END <name>
0000000000001000 00000000040E0000 start+000FD8
F00000002FF47600 F00000002FFDF9C0 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0x0

(0)> p * | grep afsd
pvproc+005400 21 afsd ACTIVE 01500EA 03900AA 00000008782BE400 0 0001
pvproc+007000 28 afsd ACTIVE 01C004A 01500EA 000000086C2FB400 0 0001
pvproc+00A800 42 afsd ACTIVE 02A0002 01500EA 00000008082C2400 0 0001
pvproc+00D400 53 afsd ACTIVE 0350088 01500EA 000000085C2F7400 0 0001
pvproc+00F800 62 afsd ACTIVE 03E007C 01500EA 000000087C2FF400 0 0001
pvproc+00FC00 63 afsd ACTIVE 03F007E 01500EA 000000080C303400 0 0001
pvproc+010000 64 afsd ACTIVE 0400080 01500EA 000000081C307400 0 0001
pvproc+010400 65 afsd ACTIVE 0410082 01500EA 000000083C30F400 0 0001
pvproc+010800 66 afsd ACTIVE 0420084 01500EA 00000008142E5400 0 0001
pvproc+010C00 67 afsd ACTIVE 0430086 01500EA 0000000824309400 0 0001
pvproc+011000 68 afsd ACTIVE 0440088 01500EA 0000000860318400 0 0001
pvproc+011400 69 afsd ACTIVE 045008A 01500EA 0000000850314400 0 0001
pvproc+011800 70 afsd ACTIVE 046008C 01500EA 000000083C2CF400 0 0001
pvproc+011C00 71 afsd ACTIVE 047008E 01500EA 000000087831E400 0 0001
pvproc+012000 72 afsd ACTIVE 0480090 01500EA 0000000808322400 0 0001
pvproc+012400 73 afsd ACTIVE 0490092 01500EA 0000000818326400 0 0001
pvproc+012800 74 afsd ACTIVE 04A0094 01500EA 000000082832A400 0 0001

(0)> stack 21
pvthread+001500 STACK:
[000DA4EC]et_wait+0002AC (00000000000DA4EC, 8000000000001032,
0000000022824024 [??])
[004A8F68]malloc_thread+000148 (??, ??, ??)
[00256C90]procentry+000010 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 28
pvthread+001C00 STACK:
[000D92D0]e_block_thread+000290 ()
[000D9F34]e_sleep_thread+0000F4 (??, ??, ??)
[004A3954]netisr_thread+000034 ()
[002577F4]threadentry+000094 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 42
pvthread+002A00 STACK:
[000D92D0]e_block_thread+000290 ()
[000D9F28]e_sleep_thread+0000E8 (??, ??, ??)
[047B25EC]config_proc+0001AC (??, ??, ??)
[00256C90]procentry+000010 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 53
pvthread+003500 STACK:
[000D92D0]e_block_thread+000290 ()
[000D9F28]e_sleep_thread+0000E8 (??, ??, ??)
[0036D110]j2PagerThread+0001B0 (??)
[002577F4]threadentry+000094 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 62
pvthread+003E00 STACK:
[000D92D0]e_block_thread+000290 ()
[000A85CC]delay+00012C (??)
[04C0A884]auth_reaper+000044 ()
[00256C90]procentry+000010 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 63
pvthread+003F00 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 64
pvthread+004000 STACK:
[000D92D0]e_block_thread+000290 ()
[04821130]dogthread+000130 (??)
[002577BC]threadentry+00005C (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 65
pvthread+004100 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 66
pvthread+004200 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 67
pvthread+004300 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 68
pvthread+004400 STACK:
[000DA4EC]et_wait+0002AC (0000000000009024, 8000000000009032,
0000000000000000 [??])
[04BA70C0]pool_kproc+000060 (??, ??, ??)
[00256C90]procentry+000010 (??, ??, ??, ??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 69
pvthread+004500 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 70
pvthread+004600 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 71
pvthread+004700 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 72
pvthread+004800 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 73
pvthread+004900 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0

(0)> stack 74
pvthread+004A00 STACK:
[000D92D0]e_block_thread+000290 ()
[001E8400]nsleep_com+0000C0 (??)
[001E91AC]nsleep+00006C (??, ??)
[00003888]mfspurr_sc_flih01+0000E4 ()
[D050A02C]_p_nsleep+00000C (??, ??)
[D0136D44]nsleep+0000E4 (??, ??)
[D0284BE8]sleep+000088 (??)
[100006E4]helper+000044 (??)
[D04F2B8C]_pthread_body+0000EC (??)
[kdb_read_mem] no real storage @ FFFFFFFFFFF97F0
Download (untitled)
text/html 29.4k

Message body not shown because it is too large.

Subject: Re: [rt.central.org #131725] AutoReply: rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7
Date: Fri, 20 Sep 2013 15:40:38 -0500
To: <openafs-bugs@openafs.org>
From: Andrew Deason <adeason@sinenomine.net>
Download (untitled)
text/plain 1k
On Wed, 18 Sep 2013 14:08:33 -0400
Brian Biswas via RT <openafs-bugs@openafs.org> wrote:

> Let us know if you need access to our aix 6.1 tl 7 machine for debugging.

I managed to get access to one, but thanks.

> Here is the requested trace:
>
> # kdb
[...]
> (0)> p * | grep afsd
[...]
> (0)> stack 21
[...]

For future reference, the instructions I gave you were wrong. But if you
run 'th * | grep afsd' (instead of 'p * | grep afsd') and otherwise
follow the same instructions, you get actually useful information. Even
if you had provided that, though, the output wouldn't have been terribly
interesting; you'd see (or at least I saw) a thread just waiting on
network i/o.

Anyway, I looked at his myself and I have a patch that fixes it for me:
<http://git.openafs.org/?p=openafs.git;a=patch;h=849e8678c8e676dbe0e219b1ecc42be042164872>

Can you please try that? If I am correct in my thinking about that, the
AIX kernel client hasn't worked for any release in the 1.6 series yet.
Can I assume that you haven't seen any 1.6 release work correctly for
the client?

--
Andrew Deason
adeason@sinenomine.net
Subject: Re: [rt.central.org #131725] AutoReply: rs_aix61-tl5.tar.gz (2013/08/12 15:30:49) fails with AIX 6.1 tl 7
Date: Mon, 23 Sep 2013 17:57:43 +0000
To: "<openafs-bugs@openafs.org>" <openafs-bugs@openafs.org>
From: "Biswas, Brian P" <bbiswas@email.unc.edu>
Download (untitled)
text/plain 201b
That worked!!! Thanks for looking into this. We were heading down the wrong path...

btw, you are correct. We had never got openafs 1.6.* to work on AIX (I think the last we
tried was 1.6.2).

--Brian
Download (untitled)
text/plain 831b
On Mon Sep 23 13:57:46 2013, bbiswas@email.unc.edu wrote:
> That worked!!! Thanks for looking into this. We were heading down the
> wrong path...

This should be fixed in the soon-to-be-released 1.6.5.1. Would you be able/willing to try that,
and make sure it works for you? If so, we would appreciate it, so we can be sure that this is
fixed prior to 1.6.5.1 being released.

If you are able to do that, you can find a tarball at /afs/.grand.central.org/software/
openafs/1.6.5.1/openafs-1.6.5.1-src.tar.gz (note that this is NOT the final code for 1.6.5.1,
but it should be very very close). Just compile that without any patches and tell us if it still
works. If it's difficult for you to reach that path or anything, just let me know and I can give you
the tarball some other way.

--
Andrew Deason
adeason@sinenomine.net