New discipline can reveal the identity of writers in the texts they write

A victim found with a noose around his neck: he appears to have hanged himself. He was an accountant and had just transferred thousands of euros to one of his customers. In an email, he explains to the recipient why he made the transfer, before ending his life. This was the case that breathed new life into the discipline of Authorship Identification at the Netherlands Forensic Institute (NFI). Experts in the field examined the email. Was it written by the victim or by someone else? Was it suicide, or was it murder or manslaughter?

“It was an unusual case”, says Wauter Bosma, one of the researchers in the Forensic Statistics and Big Data Analysis (FSBDA) field of expertise. He and his colleagues investigated the case. “The verdict reads almost like the script of a thriller.” It turned out not to be suicide. One of the individuals the money was to have been transferred to was found guilty by the court, and later the appeal court, of qualified manslaughter and was sentenced to 16 years in prison. The convicted person probably forced the victim to transfer the money or to hand over their bank card and PIN code. Based on the findings of the authorship analysis by the NFI, the court concluded that it was the convicted person rather than the victim who had written the email in order to provide an explanation for the transfer.

Man typt op toetsenbord

Mutually reinforcing

Bosma studied Information Science at Enschede University of Technology and wrote his PhD on automatic language processing by computer programs. He and his colleagues at FSBDA conducted the investigation with linguists from the NFI’s Speech, Language & Audio research area, such as Tina Cambier. Cambier studied linguistics and wrote her PhD on the elongation of sounds and how that creates structure in language. “Our fields of expertise differ, but we do both work with language. In investigations like these, FSBDA can support the research by linguists”, says Bosma.

Writer’s fingerprint

“Writers consciously or unconsciously leave behind a fingerprint in the texts they write”, explains Bosma. “Together with linguists, we investigate that fingerprint and render it visible.” They wanted to establish how great the probability was that the text of the email was written by the victim or by someone else. “We answer questions like: does the writing style correspond more closely to the suspect’s writing style or to someone else’s?”, says Cambier.

Literary authors

In itself, authorship identification is nothing new, says Cambier: “The NFI was already doing this years ago, almost always on handwritten texts and in collaboration with handwriting experts. After handwriting analysis was scrapped as a field of expertise within the NFI, there was a long gap when no cases came in.” The science of authorship analysis did continue to develop, but outside the forensic field, Cambier explains. “For example, to try to identify the authors of books written under a pseudonym. Or to establish whether a book was written by Shakespeare or someone else.” Bosma adds: “In the United States, they wanted to know who had written the constitution. They conducted an authorship analysis for that purpose.”

Frequently used words

Then in 2018, a new case was presented about a suicide which required the email to be forensically examined. The request was sent to Speech, Language & Audio. Cambier knew about recent techniques making use of Big Data, and called in the help of FSBDA. “We were able to use authorship recognition models that had already been developed outside the field of forensic investigation. We did need to make sure they had evidentiary value. We had to measure how strong the similarities were”, says Bosma. Cambier: “In the kinds of cases we get, the crucial words are those not associated with a specific subject. Working with the FSBDA experts, we searched for characteristics that were not linked to content.”

They focused on the most frequently used words: prepositions, articles and personal pronouns, such as ‘so, you, the, a, is, are’. Bosma adds: “Those words are used in every context. The frequency with which they are used varies strongly from one person to another. They help define your writing style. Because their use is so unconscious, it is hard to imitate. That makes it difficult to reproduce another person’s style.” Forensic examiners often do not have huge amounts of evidence to work with, so looking at the most frequent words is a logical choice. “We use statistical models to establish evidentiary value. It comes down to counting different frequently used words”, says Bosma. That is the strength of this tool, says Cambier: “You can establish evidentiary value based on data, and that’s more objective than making your own assessment of how distinctive something is.”

Striking features

Looking for the most frequent words is not the only standard element of authorship analyses. The linguists at the NFI also look at other striking features of the research material that may potentially be identifying. “We look for peculiarities, for example the use of exclamation marks, mangled expressions, language errors. The things stand out to me are often not the function words.” In authorship analyses, the two types of investigations look for different things and complement each other well.

Comparative material

In the investigation into the possible suicide, the issue was whether the email was written by the victim or by one of the two beneficiaries named in the email. “The police and the public prosecutor narrowed down their investigation to three possible authors of the email. It was unlikely that the email could have been written by some other as-yet unidentified person”, says Cambier. In order to conduct the authorship comparison in the case, comparative material was required. Cambier: “We requested lots of emails written by all three of the individuals concerned. We are talking about hundreds of emails, which we had to compare with the text from our email.” In this case, the analysis of frequently used words pointed in the same direction as the linguistic analysis of the peculiarities. “Afterwards it turned out that our findings were also in line with those of the police. For instance, the mouse was found on the left side of the keyboard and the suspect was left-handed, whereas the other two individuals concerned were not.” In addition, there were statements plus a DNA trace of the convicted person in the knot of the rope.

Further requests

After the case of the suicide that was not a suicide, the NFI received further requests for authorship comparisons. “The experience we gained in this case also proved useful in other investigations”, says Bosma. “Many of the subsequent requests were about electronic messages. For example, WhatsApp messages sent from an anonymous device, with the police wanting to know how likely it was that the two different accounts belonged to the same author. Or whether a phone might have been in the possession of a different person for a particular period of time.”

Time-consuming job

The researchers almost always conduct the authorship comparisons using research material from a specific case. “The comparison material you use is crucial. For example, groups of people often share features of language use; we adopt expressions from one another. You want to minimise the chances of a coincidental similarity.” This means that authorship analysis takes time. “It often takes around three to four months, because you also need to analyse and clean up other material from the case. That’s quite a time-consuming job. Sometimes it does yield an important piece in the puzzle that establishes the truth, so it’s worth it.”