Saturday, July 30, 2011

Researchers identify anonymous emails with 80-90% accuracy - I say not good enough

Originally published 3/14/11 on

At first glimpse it looks like a good thing. Researchers at Concordia University have devised a way to identify the authors of anonymous email. This is a great boon to prosecutors seeking to identify people using anonymous email accounts for illegal activity. Unlike an IP address, which can only be used to determine where an email was authored, this system will identify the author, and will do it with 80-90% accuracy.

Wait a minute. 80-90% accuracy is pretty good in some contexts, but in criminal cases? The reason for the research is sound:

“In the past few years, we’ve seen an alarming increase in the number of cybercrimes involving anonymous emails,” says study co-author Benjamin Fung, a professor of Information Systems Engineering at Concordia University and an expert in data mining – extracting useful, previously unknown knowledge from a large volume of raw data. “These emails can transmit threats or child pornography, facilitate communications between criminals or carry viruses.”

On an emotional level 80-90% seems pretty good, but is that good enough when you may be taking years from a persons life? In some cases, you could be taking their life. The case of Tim Coles is one the most prominent examples, both locally and nationally, of a person convicted on evidence that jurors thought was better than 90% accurate, but turned out to be 100% wrong. Further reading of the press release from Concordia shows that, once criminals become aware of this technique, 80-90% might be optimistic:

“Let’s say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters,” says Fung. “We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author.”

So all I have to do to fool this system is to vary my writing style. Add intentionally misspell words in some emails, be meticulously correct in others. Make grammatical mistakes in some, not in others. Or just always make mistakes when using anonymous email that I don't usually make in my signed email.

Worse, given only 80-90% accuracy, how hard would it be for someone who receives a lot of email from me - or maybe even someone who reads this blog - to frame me using email? When it comes to criminal cases, 80-90% doesn't cut it.