Spam Classification

written by: Ted Highway; article published: year 2007, month 09;

In: Root » Internet » Spam and Scam

  Share  
|
  PL  |  NL  |  FR  |  ES  |  PT  |  IT  |  DE  |  DK  |  NO  |  SE  |  FI  |  GR  |  JP  |  CN  |  KR  |  RU  |  AE


Through the use of classification techniques and forensic data gathering, we can identify specific spam groups. In some cases the identification can include a specific individual; in other cases, groups of e-mails can be positively linked to the same unspecified group. Forensic tools and techniques can allow the identification of group attributes, such as nationality, left- or right-handedness, operating system preferences, and operational habits.

Spam Organization

There are two key items for identifying individual spammers or specific spam groups: the bulk mailing tool and the spammer’s operational habits. People who send spam generally send millions of e-mails at a time.To maintain the high volume of e-mail generation, spammers use bulk-mailing tools.These tools generate unique e-mail headers and e-mail attributes that can be used to distinguish e-mail generated by different mailing tools. Although some bulk-mailing tools do permit randomized header values, field ordering, and the like, the set of items that can be randomized and the random value set are still limited to specific data subsets.

More important than the mailing tool is the fact that spammers are people, and people act consistently (until they need to change).They will use the same tools, the same systems, and the same feature subsets in the same order every time they do their work.

Simplifying the identification process, most spammers appear to be cheap. Although there are commercial bulk-mailing tools, most are very expensive. Spammers would rather create their own tools or pay someone to create a cheaper tool for them. Custom tools may have a limited distribution, but different users will use the tools differently. For example, Secure Science Corporation (SSC), a San Diego, California-based technology research company, has a unique forensic research tool that generates a unique header that is used in a unique way, which in many cases, makes it easy to sort and identify e-mails.

There are many different types of spam. Identification of an individual or group from this collection is very difficult. But there are things we can do to filter the spam. For example, a significant number of these spam messages have capital-letter hash busters located at the end of the subject line. So, we can sort the spam and look only at messages with capital-letter subject hash busters.

By sorting the spam based on specific features, we can detect some organization. We can further examine these e-mails and look for additional common attributes. For example, a significant number of spam messages have a Date with a time zone of -1700. On planet Earth, there is no time zone 1700, so this becomes a unique attribute that can be used to further organize the spam.

Based on the results of this minimal organization, we can identify specific attributes of the spammer:

■ The hash buster is nearly always connected to the subject.

■ The subject typically does not end with punctuation. However, if punctuation is included, it is usually an exclamation point.

■ The file sizes are roughly the same number of lines (between 50 and 140 lines—short compared to most spam messages).

■ Every one of the forged e-mail addresses claims to come from yahoo.com.

■ Every one of the fake account names appears to be repetitive letters followed by a number. In particular, the letters are predominantly from the left-hand side of the keyboard.This particular bulk-mailing tool requires the user to specify the fake account name.This can be done one of two ways: the user can either import a database of names or type them in by hand. In this case, the user is drumming his or her left hand on the keyboard (bcvbcv and cxzxca indicate finger drumming). With the right hand on the mouse, the user clicked the Enter key. Since the user’s right hand is on the mouse, the user is very likely right-handed.

Although this spammer sends spam daily, he does take an occasional day off— for example,Thanksgiving, New Year’s Eve, the Fourth of July, a few days after Christmas, and every Raiders home game. Even though this spammer always relays through open socks servers that could be located anywhere in the world, we know that the spammer is located in the United States. We can even identify the region as the Los Angeles basin, with annual travel in the spring to Chicago (for one to two months) and in the fall to Mexico City (for one to two weeks).
The main items that help in this identification are:

■ Bulk-mailing tool identification This does not necessarily mean identifying the specific tool; rather, this is the identification of unique mailing attributes found in the e-mail header.

■ Feature subsets Items such as hash busters (format and location), content attributes (spelling errors, grammar), and unique feature subsets from the bulk-mailing tool.

■ Sending methods Does the spammer use open relays or compromised hosts? Is there a specific time of day that the sender prefers?

The result from this classification is a profile of the spammer and/or his spamming group.

Classification Techniques

After we identify and profile individual spam groups, we can discern their intended purpose.To date, there are eight specific top-level spam classifications, including these four:

■ Unsolicited commercial e-mail (UCE) This type is generated by true company trying to contact existing or potential customers.True UCE is extremely rare, accounting for less than one-tenth of 1 percent of all spam. (If all UCE were to vanish today, nobody would notice.)

■ Nonresponsive commercial e-mail (NCE) NCE is sent by a true company that continues to contact a user after being told to stop.The key differences between UCE and NCE are (1) the user initiated contact and (2) the user later opted out from future communication. Even though the user opted out, the NCE mailer will continue to contact the user. NCE is only a problem to people who subscribe to many services, purchase items online, or initiate contact with the NCE company.

■ List makers These are spam groups that make money by harvesting email addresses and then use the list for profit, such as selling the list to other spammers or marketing agencies.

■ Scams Scams constitute the majority of spam.The goal of the scam is to acquire valuable assets through misrepresentation. Subsets under scams include 419 (“Nigerian-style” scams), malware, and phishing.

Phishing

Phishing is a subset of the scam category. Phishers represent themselves as respected companies (the target) to acquire customer accounts, information, or access privileges.Through the classification techniques just described, we can identify specific phishing groups.The key items for identification include:

■ Bulk-mailing tool identification and features

■ Mailing habits, including, but not limited to, their specific patterns and schedules

■ Types of systems used for sending the spam (e-mail origination host)

■ Types of systems used for hosting the phishing server

■ Layout of the hostile phishing server, including the use of HTML, JS, PHP, and other scripts

To date, according to SSC, there are an estimated four dozen phishing groups worldwide, with more than half the groups targeting customers in the United States.

Share

Disclaimer

1) E-articles is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. E-articles is a free information resource. If you suspect this article for any copyright infringement, please read the terms of service and contact us or use the "Report this article" button on this page to investigate the problem.
2) E-articles is not responsible for inaccuracies, falsehoods, or any other types of misinformation this article may contain and will not be liable for any loss or damage suffered by a user through the user's reliance on the information gained here.