FuzzyOCR for SpamAssassin on Ubuntu

October 10th, 2007
No Gravatar

FuzzyOCR is a plugin for SpamAssassin that analyzes the content and properties of images to distinguish between normal mail and spam.

I’ve been running it on some mail servers for a few months now and I’m very happy with the results.As ever, the instructions are Ubuntu centric.

Download the latest FuzzyOCR from http://fuzzyocr.own-hero.net/

Secondly, you need a reasonable list of prerequisites

NetPBM Tools (apt-get install libnetpbm10 libnetpbm10-dev)
GifSicle (apt-get install gifsicle)

Next, GifLib/Libungif, it doesn’t really matter which.
Download the latest version, its a simple ./configure && make && make install

Now we need an OCR engine, I installed both Ocrad and Gocr, both from source, the Ubunu sources a little old.

Finally a whole bunch of Perl modules, String::Approx, Time::HiRes, MLDBM, MLDBM::Sync, Log::Agent
Optionally, you can store the images hashes in a database, if you fancy it, install DBI http://dbi.perl.org] and DBD::mysql.

Now we can configure FuzzyOCR, put the FuzzyOcr.cf, FuzzyOcr.scansets, FuzzyOcr.preps and the FuzzyOcr.pm files, as well as the FuzzyOcr/ folder into /etc/mail/spamassassin.

Have a read of FuzzyOcr.cf, I made a few changes like change the log directory path and such.
I’d recommend changing
focr_enable_image_hashing
to
focr_enable_image_hashing 2
Which stores the hashes in the MLDBM database.

Create a word list, I just copied the FuzzyOcr.words into /etc/mail/spamassassin.

Run spamassassin -D –lint

Now we can test, download sample-mails.tar.gz from the FuzzyOCR page and extract.
Finally run

spamassassin –debug FuzzyOcr < ocr-gif.eml > /dev/null

And check for the FuzzyOCR entries in the log.

Easy, eh. If you get stuck check out the FuzzyOCR docs.

Bookmark it del.icio.us | Reddit | Slashdot | Digg | Facebook | Technorati | Google | StumbleUpon | Window Live | Tailrank | Furl | Propeller | Yahoo


Was this post useful to you? Let me know, buy me a beer!
Alternatively, if you're feeling impecunious, you may like to subscribe to my RSS feed, or see other articles in the Geekery, Linux category.

Leave a Reply