The summary of the problem is basically this: The
DirectoryService process crashes for some reason, then gets restarted by launchd. However, AFP (or more specifically, the AppleFileServer process) appears to not regain its connection to it. This prevents any new AFP connections from being able to authenticate, and existing ones are unable to re-authenticate. Couple this with AFP mounted home directories, and now your users can't log in to their workstations, or their existing session hangs.In said discussions there are dozens of proposed workarounds. These include: Periodically HUP'ing the
AppleFileServer process, setting up some crazy firewall rules, periodically toggling guest access, and numerous other things. I personally have tried many of them and can confidently say that none of them are a good solution. The toggling seems to mitigate the problem to some degree, but eventually things still come down hard.One fix that appeared promising which we tried recently is not running Open Directory (of the network variety) on the same host as AFP. Fortunately we had a second XServe which was acting as an OD replica and not much else, so we demoted it to a server which is just connected to OD, and moved out AFP home share there. This seemed to work fine for at least a day, but then this weekend the
DirectoryService process crashed yet again, causing the same problem as before.eThe thing that really blows my mind about this whole issue is that people have been reporting it since November of last year. That's 5 whole months, and still no sign of a fix from Apple! Say what you will about other companies being slow to respond to problems, I've never seen a major issue like this take so long to be fixed by anyone else.
With OS X 10.5.3 being seeded to developers in the last few days, I hope that Apple finally gets on the ball and fixes this glaring problem! This is definitely one of the most frustrating problems I've encountered during my time in the computing industry..

5 comments:
This is upsetting me too. One thread I read said it was something about being PowerPC, are you using a PowerPC Xserve as well?
No, we're using Intel. So it's definitely not a PowerPC thing. The problem is supposedly fixed as of 10.5.3, and 10.5.4 is available. We'll be moving our XServe back to Leopard this weekend, so I'll post an update as to how it goes.
Yep, ditto here. We moved our file server back to Leopard and all is well now.
We had to downgrade as well back in December, was looking that day to see if anyone had gotten a fix and found out it was solved. Upgraded and haven't had any trouble for a week now. So, I think we're good. Awesome!
Post a Comment