Spam Analysis
Over the last couple months, my parents have been receiving a lot more spam. So, I’ve been adding some stricter mail filtering in Postfix and Spamassassin. This has been pretty successful, but I’ve been wanting to have some better visibility into how many messages were rejected due to the various checks, and to find what the largest “offenders” were by IP. (And have an easy way to look back to find what legitimate mail might have been blocked). I use pflogsumm to send me a basic report each night about the mssages that Postfix sends and rejects. It’s OK, but wasn’t really doing everything that I wanted.
Background
I wrote some PHP scripts to parse through my mail logs and to built some reports. I have PHP scripts that send the log files to a web service on my web app server every night. That parses through the lines and adds the appropriate information about each rejection to a MySQL database. I also have a couple reports that generate groups with the Google Charts API. The PHP web app uses the CodeIgniter. Right now, it’s not very polished.but it works decently well.
I’ve also used Excel a bit through ODBC to pull the data in to Pivot Tables. This was a good way to do a quick mock-up to see what kind of reports and information would be most useful to me.
Results
I recently enabled strict reverse DNS checking on my Postfix MX servers. By far, this seems to be blocking the most mail. (The reverse DNS checks are done first though, so a number of these messages would have likely been blocked through other methods if the RDNS checks were disabled). The vast majority of these messages were truly spam. I’ve seen a few instances where semi-legitimate mail was being blocked. (Not technically spam, but the messages were for me and I can do without them). Honestly, if someone can’t set up reverse DNS properly, they probably aren’t the kind of people I want to accept mail from.
I’m also using a few Spamhaus blocklists. These are catching a few messages, not nearly as many as the reverse DNS checks.
Here’s another chart, of the messages that were blocked each day with each method. It was interesting to see that the Spamassassin matches went down pretty dramatically when I enabled the revers DNS checks. (Presumably, a lot of messages that would have been marked as spam by Spamassassin were being rejected immediately on the MX servers and not making it to Spamassassin).
I should also note, that Spamassassin isn’t technically rejecting these messages. These are just messages that it is marking as spam and putting in people’s Junk mail folders.
I’d be curious to see if these results appear to be comparable to what other people see on MX servers. I would think the effectiveness of different anti-spam measures would vary based on who is spamming the users. Given the relatively small number of users I host mail for, I assume my sample size is too low to have globally representative numbers.