Rejecting Spam Early May Be Bad

Many anti-spam systems reject some or all messages at a very early stage, before the messages arrive at the mail server. There are a number of techniques to achieve this. Doing this is a big performance/efficiency win.

The major problem with early rejection is that it is hard to find out what you have rejected. We modified the standard RBL (Real Time Black Hole) mechanism to annotate messages (rather than immediately reject). These annotations can then be picked up by the content filtering.

There are two advantages to this:

  • It allows the mechanism to be effectively tested and measured. We tried a number of RBLs and found that some (particularly those that hit a lot of spam) lead to quite a few false positives. I suspect that many "early reject" systems are bouncing quite a lot of real messages (false positives) and that this is just not noticed.
  • If messages are placed into a quarantine, it allows a user to check for false positives, and to retrieve if necessary. This can be time consuming where there is a lot of spam, but may be the right choice for some users.

Bouncing spam early does have performance advantages, but there are some real trade-offs.

Author: Steve Kille

One Comment

  1. Posted July 14, 2005 at 5:54 AM | Permalink

    I’m sorry, but we get about 8 million Spam emails a month for 3,500 email accounts. The benefit of rejecting Spam before it even touches our network, saves us enormous amounts of bandwidth and requires less servers to process.

  2. Colin Bush
    Posted July 14, 2005 at 7:49 AM | Permalink

    This is an excellent suggestion. The organizations that I work with are quite sensitive to false positive detections. End-user quarantine management seems to be the only way to truely manage this issue. In the past, we have been quite leary of RBLs and Reputation-based systems for this very reason. While I can understand that blocking 8 million messages a month has major bandwidth and server savings, there are organizations that cannot risk the possibility of not receiving a legitimate e-mail.

  3. Posted July 14, 2005 at 9:33 AM | Permalink

    It’s important to separate several issues involved here. One is the use of technology to block messages without looking at the message contents. The chief purpose of this isn’t efficiency, although that is a benefit. The purpose is to detect and block those email attacks where there is no message content, namely directory harvest attacks (DHA) as well as no-content spam (no text, just a JPG). Relying solely on content analysis will let a lot of spam through, and will do nothing to stop DHAs, which currently amount to 35%-40% of email traffic.

    The second issue here is the matter of “reputation” systems like RBLs. As Kille suggested in his original posting, they are prone to false positives. That’s because RBL’s are anything but “real time”. They are too static and do not cope well with the nature of today’s spam — that is, that it comes from lots of machines (zombies) that have good “reputations.” The correct approach is to analyze the BEHAVIOR of the sending machines. This eliminates the problem of false positives (or false negatives) from relying on out of date reputation data.

Post a comment

You must be logged in to post a comment. To comment, first join our community.