Confessions of a researchaholic

July 10, 2009

Proving human

Filed under: Real — liyiwei @ 9:54 am
Tags: , ,

In the Terminator movies, proving whether one is human or machine is crucial for the survival of the entire humanity. But it also has humbler applications, e.g. anti-spam.

Spam is bad; it causes great inconvenience in our daily life, and it even contributes to globally warming.

Let me classify the anti-spam techniques into two main categories: content-based or behavior based. The former looks into the content (of an email or a blog comment) and judges whether it talks more like a human or a machine (spam). The latter is concerns about the behavior of the entity (behind an email or blog comment) and judges whether it more likely a human or a machine (spam). The content-based approach is more prescriptive and takes place *after* the event has happened (an email sent or a blog comment made), whereas the behavior-based approach is more preventive and takes place *before* the event has happened.

The content-based approach, e.g. spam filters, has been constantly improving but is not yet (and likely never will be) 100% accurate. We have all received spam emails that slipped through the filter (false negative), as well as lost non-spam emails that got wrongly caught (false positive). In general, this battle is tougher for the good guys, as they are tackling the (more difficult) analysis problem whereas the evil guys are dealing with the (simpler) synthesis problem.

The behavior-based approach, e.g. word verification, takes a different route. Instead of judging whether an act is performed by a human or a machine, it simply structures the environment so that only humans can accomplish the task. This would put more burden on humans (e.g. for word verification one has to identify texts from a picture and enter that) but usually not a big hassle relative to the original task (e.g. composing email or comments). However, this is a battle easier for the good guys, as they are tackling the (simpler) synthesis problem whereas the evil guys are dealing with the (more difficult) analysis problem. (To my knowledge, no computer vision or pattern recognition techniques could break picture or sound based word verification so far.)

I wonder if anti-spam research should be focused more on the behavior side (computer-human interaction) rather than the content side (algorithm).

P.S. This post was inspired by Ken Perlin’s blog entry on actual humans.

Theme: Rubric. Get a free blog at WordPress.com