Recently I've been trying a compromise between TOE and TEFT; if a message comes through with a confidence level under 100, I'll train on it. That hasn't really helped CRM114 converge any quicker. I think fast convergence really requires shared data sets, a la Google.
Speaking of which, I've also got a Gmail account (with the same username as this one). I use this for signing up for commercial services that I think will sell my address or otherwise be annoying, and mostly only check that address when I am expecting a particular piece of mail. I thought of giving up on maintaining my own Bayesian filters and just forwarding all my mail to Gmail (which near as I can tell uses pattern-based filtering and ever-vigilant professional pattern authors), but 4.2 doesn't talk POP over SSL, and I want to be able to read mail offline, and to search current and historic mail together. And I do like Bayesian filters' ability to give me only that portion of a mailing list's traffic that will actually interest me, even when the non-interesting parts aren't spam per se.
So, why not filter just the spam to Gmail? Forwarding just the high-confidence messages should keep my Eudora from crashing, and I'll still get the false positives on my Eudora client where I can see them. But the mailbox filter to separate low-confidence and high-confidence spam was after whatever was making Eudora crash. Fortunately, rewriting the filter in procmail wasn't too hard, and now my high-confidence spam goes to my Gmail account, where it can rot for 30 days before Google automatically deletes it.
CRM114: Crash's Bayesian mail filtering program.
TOE: Train On Error.
TEFT: Train Everything.