Computer assists NFI experts in comparing faces in photographs

Wrinkles, the position of the eyes, eyebrows and moles – all of which are characteristics that experts at the Netherlands Forensic Institute (NFI) in the field of facial comparisons will review to compare individuals in photographs. These photographs will often relate to images from security cameras that have recorded individuals suspected of burglaries, assaults or robberies.

NFI | Voorbeeld van een beeldvergelijking. De persoon op de foto is Arnout Ruifrok zelf.

If there are doubts in a criminal case as to whether the individual in the photograph is the same individual as the suspect, the expertise of the NFI is called in. In addition to experienced experts who make visual assessments, the NFI has recently begun using a new method: software that supports the expert’s judgment.

Is the unknown individual in the disputed image of the security camera footage the suspect pictured on the screen? The opinions of the Public Prosecution Service and those in the legal profession may differ on this aspect during the hearing. The court may then request that the experts of the NFI examine how probable it is that the individual in the photo is the suspect.

A trained eye

Facial comparison based on photographs is used in a wide variety of cases, ranging from robberies to debit card fraud and from common assaults to individuals who turn up as suspects in a war zone.

The crucial question is whether the individual in these images is the suspect. With almost twenty years of experience under his belt, Arnout Ruifrok has a well-trained pair of eyes. ‘I look at the shape of the face, the forehead, wrinkles, shape of the eyes, the corner of the eye, the nose, nostrils, moles and mouth. We examine the entire face and try to determine if we see any similarities – or any differences.’

The experts will give a little more weight to distinguishing details such as pigmentation marks, wrinkles and moles. ‘We also look at the shape of the skull and the hairline’, says Ruifrok. ‘The experts go through a list of characteristics.’

Experts will note down which characteristics on the list they are able to see and what the similarities and differences are. If the characteristics do not match, they will describe what the reason for this might be. It may be the angle at which the photo was taken or perhaps the difference is due to the angle of the light or due to the blurring of the images. ‘We keep all those variables in mind’, Ruifrok explains.

Humans and computers

In order to reinforce the assessment of the experts, Ruifrok has in recent years been working on developing a new, objective method: facial comparison software.

‘Imagine you have two photographs in a particular case: the disputed image of a robber and the reference image of the suspect. As an expert, I would examine the images, however, now, I would have a software programme reviewing alongside me. Just like any human, the computer will examine the facial characteristics. A smart algorithm has been trained specifically for that purpose by allowing it to review a large number of images of faces in all manner of poses and lighting types. And just like humans, the software is able to distinguish characteristics, albeit more objectively. The software provides a score of how similar the images are, expressed in a number. ‘It provides a score between 0 and 1 – 0 being completely different and 1 being exactly the same photo’, Ruifrok explains.

Poor image quality

Subsequently translating that score into evidential value is a more complicated affair. Ruifrok discovered that if the software were fed two poor quality photos, the score provided by the software would determine that the photos were very similar, as a result of the fact of the model identifying a similarity: namely, both images being of a comparable, poor quality.

‘That could be blurring, poor lighting or a funny angle. So, if two images were of a similarly poor quality, the software would say they looked alike and would provide a high score’, says Ruifrok.

This was a problem he would have to tackle. He subsequently developed a method that determines the impact of the quality of the photo on the final score. Knowing the quality of an image would then tell you to what extent the score was determined by the similarities or differences in quality.

After all, two poor quality images would yield a high score, as the model would identify a high degree of similarity due to the fact that both images are blurred, for example. However, the key issue is determining whether the score is high because the faces match.

Database of volunteers

How, then, could that be determined? Ruifrok compared the poor quality images, such as those from security CCTV, to a database containing photos of various types of quality. The database only contains images of faces of people who gave their explicit consent to the use of their image for the intended purpose.

He would then compare ‘bad’ photos with the database to determine whether the quality of his image was high or low – a score between 0 and 1 was similarly used in this process. The lower the score, the fewer similarities between the images.

As such, if a contested, poor quality photograph were compared with the database and comparison would yield a low score, this meant that the photo in question was of good quality. After all, the system is able to distinguish the photo from photos of other people ranging in quality. In the case of a high score, the software would no longer be able to distinguish the image from other poor quality photos, which would wrongly be confused with the faces of other people due to the circumstances – the poor quality of the photos.

‘Once I know what the quality of the image is, I can test it against other good quality photos as are available on passports or driving licences. This tells me what the average score of the photos is in respect of the good quality photos’, he explains. ‘I can then calculate the evidential value by comparing the probability of obtaining a score if the same individual is concerned with the probability of the score if the image relates to someone else.’

Both assessments to be published

The research shows that there is a high degree of alignment between the assessment of humans and the assessment provided by the software. Both results will be included in the expert report from now on.

Ruifrok emphasises that the judgment of experts will always continue to coexist alongside the assessment of the software.  ‘As a human being, you can account for and discuss differences. “That looks like a wrinkle to me – do you see it too or is it a shadow? Should we include it or not?” As experts, we can discuss those aspects, which is something we can’t do with a computer, which is unable to identify the grey areas.’

The new method ensures that the experts can now better assess the evidential value of sub-optimal images. They help the experts make a probability judgment on whether the results of the investigation match the hypothesis of the individual in the photos being the same person or not better.

‘I have been an expert in the field of facial comparison for the past twenty years, and yet I remain aware of the limitations I have as a human being, such as bias and preconceptions. You have to acknowledge these limitations, as this is how our brain works. That is why we always review facial images independently with three pairs of eyes. The fact that we now also use the comparison software as a more objective method makes for a great addition.’

Ruifrok wrote a scientific article on this method, which was published in Forensic Science International.