Fighting Spam

The increase of Spam is a trend that started few years ago and it is continuously expanding, but to fight it there is plenty of good Open Source software. To use it effectively, anyway, it is needed a good knowledge of how spam floods our inboxes.

How spammers collect email addresses

The most used address gathering technique is using virus and spyware software that, without user awareness, gather email addresses from Outlook (or other email software) address-books and send this information to the spammers. Another popular address gathering technique is that of using web spiders to collect email addresses from web pages and newsgroups.

How spammers send millions of emails

Spam email can be sent in different ways, but the most popular and the one that creates most problems to users is that of using legions of zombie PCs infected by viruses and spyware; these PCs are remotely controlled by spammer's servers that give, to each one of them, a certain number of target email addresses to infest with spam.
Usually the virus-like program running on zombie PCs modify slightly the email text, based on some modification rules, and insert some random small errors that are unique for each email message; the purpose of these modifications is to defeat anti-spam software and to prevent that these emails, sent do different users, are detected as identical by the anti-spam software.
For example a spam email sent to one user starts with
I am the lawyer Jacopo Hilderich, owenr of the law office with the same name
the same email sent to another user starts with:
I am the laywer Giulio Sepp, owner of the law office with the same name
please look at the change in the name of the lawyer and the two different errors. These modifications usually prevent the antispam software in recognizing the two email messages as identical. If a smart user has a doubt about this email he can check an exact phrase, picked-up from the email, on Google but, due to these variations, he will not be able to find anything and can think that nobody has reported this email as spam on related forums and can conclude that, may be, this email is real.
Another important consideration is that infected PCs are used to send spam, this means that bandwidth and processing power is basically free for spammers. This fact explains why these PCs generate millions of email addresses hoping that at least few of them are real. This email address generation is not done randomly, but following specific rules. Analyzing the log file on my own mail server I found that a lot of spam was sent to not existing email addresses generated from valid email addresses. For example valerio@ is a valid email address, but the following logged email addresses are not
In the log I found many many many other email addresses generated using similar rules like cutting leading chars from user names or inverting phonetically similar chars like "v" and "w", "l" and "r", "m" and "n" and so on.

Impact on mail servers

The impact on Internet mail servers can be enormous, because it is enormous the quantity of spam email to process. If an anti-spam filter that analyze each email message is installed on the mail server, the server load can became very high and can create disruption in the email service; this can happen on very powerful servers also. Spam can also eat a significant amount of bandwidth.
But if we succeed to immediately refuse spam email we can largely reduce our mail server workload. If, when an infected PC contact our mail server, we are able to recognize this PC as a spam source we can refuse the connection, save bandwidth and save processing power because the email is refused and no transfer, nor processing is done.
How to recognize that a PC contacting our email server is a spam source? A 100% accurate answer is not possible, but to get a reasonable answer we can ask to the Real Time Black Lists (rtbl) available on the Internet. The lists include PCs and servers that have been detected as recent spam source. I am currently using two of these lists: Sorbs and Spamcop, but of them can be used free of charge. Anyway many more black lists are available. These lists use many techniques, automatic and manual, to include and remove IP addresses; one of the most used automatic technique is that of using spamtrap: email addresses, put on some web pages to be harvested by spammers, that are not used by any real user, this means that every email they receive are sent by spammers.
As an example the Company I work for has two email servers (one primary and one secondary server) and receive about 10,000 emails per day; on these two servers, identically configured, the OS is Linux and the anti-spam software manager is MailScanner that manages SpamAssassin, the antivirus ClamAV and the mail software Sendmail.
Logging statistics show us that in a typical day we receive the following number of emails:
Description Primary Server % Secondary Server % Total %
Email Total 7307 100% 3073 100% 10380 100%
Email to unknown users 1565  21%  665  22%  2230  21%
Email from blacklisted IP 4464  61% 2087  68%  6551  63%
Spamassassin detected spam  579   8%  268   9%   847   8%
Normal email (not spam)  699  10%   53   2%   752   7%

These numbers are from a specific Company, anyway they tell us the order of magnitude of this problem:
  • the number of emails sent to unknown users is about double the normal (not spam) emails;
  • using Real Time Black Lists (Sorbs and SpamCop in this case) it is possible to block immediately large part of Spam;
  • refusing email sent to unknown addresses and email sent from blacklisted IP we are able to cut by 80% the number of the email that the server must process. This is very important, because, if instead of doing this we would accept all the email we are receiving and if we would run an anti-spam software over them we would need a processing power 5 times greater than what is needed in our case;
  • the amount of spam sent to the secondary server is much higher than the spam sent to the primary server. This happens because many spam sending software prefers to send spam email to secondary servers hoping that they are hosted by the ISP and that no Real Time Black List is used on these secondary servers, allowing a smoother path to the victim inbox. For this reason it is important to have the secondary (backup) mail server managed by the same organization that manage the primary server and to have it configured in the same way as the primary server.
When we use Real Time Black Lists to block spammers, it can happen that some companies, we have relations with, are blocked because their server has been included in a blacklist. Sometimes our counterpart has no possibility to have his server removed from the list, for this reason it is very important to have locally managed white-lists that will override the Interned based black lists and that will allow us to receive emails from these servers.
This is very important in Italy (where I live) because many official mail servers from Telecom Italia, Fastweb and other major ISP are included in Internet based blacklists. I suppose that this happens because of very poor spam management by our major ISPs.

Tags: , ,