Warning This patch has only been tested by me on RedHat 7.0 Kernel 2.2.19-7.0.8enterprise, although it ought to work on any platform that uses flock. Other folks have installed this on 2.1.3 and I have included an untested patch that should work for this. Thanks to Thomas Jarosch for tips on improving the patching process.
A diff (for patching) is available at http://servercc.oakton.edu/~jwade/cyrus/cyrus-imapd-2.0.16/cyrus-imapd-2.0.16-flock.patch
To patch, copy this to your cyrus-imapd-2.0.16 source directory and
patch -p1 < cyrus-imapd-2.0.16-flock.patch
A diff (for patching) is available at http://servercc.oakton.edu/~jwade/cyrus/cyrus-imapd-2.1.3/cyrus-imapd-2.1.3-flock.patch
To patch, copy this to your cyrus-imapd-2.1.3 source directory and
patch -p1 < cyrus-imapd-2.1.3-flock.patch
Many of the folks experiencing the problem have reduced maxtime to values as low as 17 (0+1+4+9+16=30 second total delay) and have been happy, I would recommend starting with a higher value and checking to see how long legitimate locks actually take (by reviewing the sleeping messages in your syslog.)
This happens a couple of times a week for us, and it has ceased to be an issue (we only know about it from the logs.) That being said, it is still a hack.
The symptoms we saw were not tied to the mailboxes.db or any of the
Berkeley db databases. Instead, I would have processes that would
lock
up trying to access a user's mailbox. Subsequent attempts
to access the mailbox would block waiting for the first process to release
the
locks on the various header and seen files in the mailbox.
Finally, incoming mail would die because all of the lmtpd processes would
get
blocked trying to deliver to the locked mailbox.
An easy way to see if you are experiencing this is to use gdb to debug
one of the processes and then do a bt (backtrace) if you see that the
process is in the middle of an flock call. To be absolutely
sure follow the complete steps below.
First, check to see which processes are accessing the user's seen file:
[root@borg /root]# lsof | grep jwade.seen
Then use gdb to get a backtrace of each of these processes.
Here is a
sample gdb backtrace. Just substitute the process id of
your suspect
imapd process in place of 24876
[root@borg /root]#gdb /usr/cyrus/bin/imapd 24876
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License,
and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty"
for
details.
This GDB was configured as "i386-redhat-linux"...
/root/24876: No such file or directory.
Attaching to program: /usr/cyrus/bin/imapd, Pid 24876
Reading symbols from /usr/local/lib/libsasl.so.7...done.
Loaded symbols for /usr/local/lib/libsasl.so.7
<snip, lots of symbol loading cut out here>
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
0x4028a8a1 in __flock () from /lib/libc.so.6
(gdb) bt
#0 0x4028a8a1 in __flock () from /lib/libc.so.6
#1 0x8079caf in lock_reopen (fd=22,
filename=0x810fa08 "/var/imap/user/p/jwade.seen",
sbuf=0xbfffebe0,
failaction=0xbfffebdc)
at lock_flock.c:84
#2 0x807b8c7 in starttxn_or_refetch (db=0x8111298, mytid=0x8111288)
at cyrusdb_flat.c:211
#3 0x807ba27 in myfetch (db=0x8111298, key=0x8104328 "0e461b1e389ccb86",
keylen=16, data=0xbfffecb4, datalen=0xbfffecb8,
mytid=0x8111288)
at cyrusdb_flat.c:270
#4 0x806f943 in seen_readit (seendb=0x8111278, lastreadptr=0xbfffed38,
lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40,seenuidsptr=0xbfffed44,
rw=1) at seen_db.c:268
#5 0x806fb01 in seen_lockread (seendb=0x8111278,lastreadptr=0xbfffed38,
lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40,seenuidsptr=0xbfffed44)
at seen_db.c:332
#6 0x8060daa in append_addseen (mailbox=0xbfffedb0,
userid=0xbffff05c "jwade", msgrange=0x810f930
"271") at append.c:896
#7 0x805f929 in append_commit (as=0xbfffedb0, uidvalidity=0xbfffeda4,
start=0xbfffeda8, num=0xbfffedac) at append.c:240
#8 0x8059a2f in index_copy (mailbox=0x80ffe20, sequence=0x810fbf8
"11352",
usinguid=1, name=0xbffff2c0 "user.jwade.FaMiLy",
copyuidp=0xbffff2bc)
at index.c:1226
#9 0x80531cf in cmd_copy (tag=0x810fae8 "47", sequence=0x810fbf8
"11352",
name=0x810fc68 "INBOX.FaMiLy", usinguid=1) at
imapd.c:3192
#10 0x804d85c in cmdloop () at imapd.c:791
#11 0x804cf34 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
at imapd.c:546
#12 0x804bce6 in main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
at service.c:285
#13 0x401d2f31 in __libc_start_main (main=0x804b8f8 <main>,
argc=1,
ubp_av=0xbffffe14, init=0x804a9e4 <_init>,
fini=0x807e84c <_fini>,
rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c)
at ../sysdeps/generic/libc-start.c:129
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or
n) y
Detaching from program: /usr/cyrus/bin/imapd, Pid 24876
The key thing here is that this process was in the lock_reopen function trying to lock the .seen file. (see the section highlighted in red)
If this is your problem, my patch might help. If you use this successfully (or if you have any problems), send me an email: jwade@oakton.edu
Enjoy,
John Wade
Director of Systems and Network Services
Oakton Community College
email: jwade@oakton.edu
http://servercc.oakton.edu/~jwade/