phi (totient) wrote,

mail complexity

It's not like I get all that much spam. But some of it causes Eudora 4.2 to crash, and I don't feel like using the adware version or paying all over again for a program that's fairly similar in features. So a while ago I installed CRM114, and I've been training it since. It got to 98 or 99% quickly, but stubbornly refuses to get any better than that. And the failures happen in both directions.

Recently I've been trying a compromise between TOE and TEFT; if a message comes through with a confidence level under 100, I'll train on it. That hasn't really helped CRM114 converge any quicker. I think fast convergence really requires shared data sets, a la Google.

Speaking of which, I've also got a Gmail account (with the same username as this one). I use this for signing up for commercial services that I think will sell my address or otherwise be annoying, and mostly only check that address when I am expecting a particular piece of mail. I thought of giving up on maintaining my own Bayesian filters and just forwarding all my mail to Gmail (which near as I can tell uses pattern-based filtering and ever-vigilant professional pattern authors), but 4.2 doesn't talk POP over SSL, and I want to be able to read mail offline, and to search current and historic mail together. And I do like Bayesian filters' ability to give me only that portion of a mailing list's traffic that will actually interest me, even when the non-interesting parts aren't spam per se.

So, why not filter just the spam to Gmail? Forwarding just the high-confidence messages should keep my Eudora from crashing, and I'll still get the false positives on my Eudora client where I can see them. But the mailbox filter to separate low-confidence and high-confidence spam was after whatever was making Eudora crash. Fortunately, rewriting the filter in procmail wasn't too hard, and now my high-confidence spam goes to my Gmail account, where it can rot for 30 days before Google automatically deletes it.

CRM114: Crash's Bayesian mail filtering program.
TOE: Train On Error.
TEFT: Train Everything.
  • Post a new comment


    Comments allowed for friends only

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded