Monday, May 08, 2006

SpamBayes: Bayesian Anti-Spam Filter

A few months ago, I wrote an article about creating a whitelist in Outlook Express to help reduce spam. The biggest problem with whitelists, is that they require a lot of maintenance and they filter out the good with the bad messages. So I am not a big fan of this technology.

The good news is that there is a much better alternative to whitelists, and its called Bayesian filtering. Bayesian filters are one of the best anti-spam technologies I ever used. They work by statistically ranking the contents of e-mail messages.

Generally spam messages will use words and/or word patterns that don't typically appear in most e-mail messages. This makes it easy for the spam filter to detect and remove these e-mails.

SpamBayes is a free open-source Bayesian filter that can be run as an Outlook plug-in under Windows. There also is a POP3 or IMAP proxy version available for Windows, Linux/Unix, and the Mac OS.

When you first install a Bayesian filter, you will need to train it. You need to let it analyze good messages, and bad messages (i.e.: spam). After you train it will constantly be learning what is and what's not spam. While its first learning, it may make some mistakes, but its accuracy should improve.

Bayesian filters are also known for having a low 'false positive' rate. A false positive is a message that is classified as spam when its really not.

1 comment:

IlarionC said...

Bayesian seems to be the best way to go.
I am very pleased with Spambully, which uses Bayesian as well as a few other techniques.
But it's not free, it offers a free trial