FuzzyOCR for SpamAssassin on Ubuntu
FuzzyOCR is a plugin for SpamAssassin that analyzes the content and properties of images to distinguish between normal mail and spam.
I've been running it on some mail servers for a few months now and I'm very happy with the results.As ever, the instructions are Ubuntu centric.
Download the latest FuzzyOCR from http://fuzzyocr.own-hero.net/
Secondly, you need a reasonable list of prerequisites
NetPBM Tools (apt-get install libnetpbm10 libnetpbm10-dev)
GifSicle (apt-get install gifsicle)
Next, GifLib/Libungif, it doesn't really matter which.
Download the latest version, its a simple ./configure && make && make install
Now we need an OCR engine, I installed both Ocrad and Gocr, both from source, the Ubunu sources a little old.
Finally a whole bunch of Perl modules, String::Approx, Time::HiRes, MLDBM, MLDBM::Sync, Log::Agent
Optionally, you can store the images hashes in a database, if you fancy it, install DBI http://dbi.perl.org] and DBD::mysql.
Now we can configure FuzzyOCR, put the FuzzyOcr.cf, FuzzyOcr.scansets, FuzzyOcr.preps and the FuzzyOcr.pm files, as well as the FuzzyOcr/ folder into /etc/mail/spamassassin.
Have a read of FuzzyOcr.cf, I made a few changes like change the log directory path and such.
I'd recommend changing
focr_enable_image_hashing
to
focr_enable_image_hashing 2
Which stores the hashes in the MLDBM database.
Create a word list, I just copied the FuzzyOcr.words into /etc/mail/spamassassin.
Run spamassassin -D --lint
Now we can test, download sample-mails.tar.gz from the FuzzyOCR page and extract.
Finally run
spamassassin --debug FuzzyOcr < ocr-gif.eml > /dev/null
And check for the FuzzyOCR entries in the log.
Easy, eh. If you get stuck check out the FuzzyOCR docs.
Related posts
- FuzzyOCR inspired PDF scanning for SpamAssassin
I've just stumbled over a PDF scanning engine for SpamAssassin. In light of the recent... - SpamAssassin site wide spam learning
SpamAssassin is great. I wouldn't run a mail server without it. Obviously it isn't 100%... - SpamAssassin: How to protect against current spam attacks
Christopher J. Buckley has posted a good article on protecting against current spam attacks. Go... - Installing VMware Server & MUI on Ubuntu 7.10
Installing VMware on Ubuntu 7.10 isn't as easy as Ubuntu usually makes things out to... - Install Imagemagick / Imagick for PHP on Ubuntu
No problem if you want to install imagemagick on your server, Ubuntu makes this very...



