DNA-profiling is one of the most valuable forensic approaches: it can generate investigative leads by retrieving suspect names from DNA database searches and high weights of evidence when suspects and evidential traces are compared. The growing number of markers in profiling systems makes DNA-profile comparison increasingly time-consuming and error-prone. Software tools may change this to the better, which prompted the development of the expert system DNAxs.
A user-friendly, well-validated software that compares profiles within seconds and proceeds to statistical analysis according to advanced probabilistic models will increase the quality and number of cases that can be handled, and stimulate the use of such advanced statistics. Forensic experts can adopt a well-structured and uniform workflow and focus on the scientific aspects of the work by which the overall quality and uniformity of forensic casework will increase.
The NFI has developed a software expert system, DNAxs, for case data management, within case matching of profiles and complex DNA-profile interpretation using LoCIM inference and computation of consensus and composite profiles. The software links to other software tools such as SmartRank, Bonaparte and FDStools and CODIS. Within the software the DNAStatistX module has the continuous maximum likelihood ratio model from EuroForMix integrated to calculate the evidential value with probabilistic DNA-statistics effortlessly. The platform aids the DNA-scientists in casework by giving overview and managing the increasingly complex data interpretation and decision making process. In addition DNAxs has increased consistency and accountability, while reducing errors and interpretation variability.
- Flyer DNAxs workshops (see documents below)
- Google form to register interest in attending the DNAxs workshop
Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach
The number of contributors (NOC) to (complex) autosomal STR profiles cannot be determined with absolute certainty due to complicating factors such as allele sharing and allelic drop-out. The precision of NOC estimations can be improved by increasing the number of (highly polymorphic) markers, the use of massively parallel sequencing instead of capillary electrophoresis, and/or using more profile information than only the allele counts. In a recent study, we focussed on machine learning approaches in order to make maximum use of the profile information. To this end, a set of 590 PowerPlex® Fusion 6C profiles with one up to five contributors were generated from a total of 1174 different donors.
This set varied for the template amount of DNA, mixture proportion, levels of allele sharing, allelic drop-out and degradation. The dataset was split into a training, test and hold-out set. The training set contained labels with the known NOC and was used to train and optimize ten different algorithms with selection of profile characteristics. Per profile, over 250 characteristics, denoted ‘features’, were calculated. These features were based on allele counts, peak heights and allele frequencies.
The features that were most related to the NOC were selected based on partial correlation using the training set. Next, the performance of each model (=combination of features plus algorithm) was examined using the test set. A random forest classifier with 19 features, denoted the ‘RFC19-model’ showed best performance and was selected for further validation.
Results showed improved accuracy compared to the conventional maximum allele count approach and an in-house nC-tool based on the total allele count. The method is extremely fast and regarded useful for application in forensic casework. The RFC19 machine learning model can be downloaded via https://github.com/JenniferVdL/NOCmodel and will be implemented in our DNA eXpert System, DNAxs.
- C.C.G. Benschop, J. Hoogenboom, P. Hovers, M. Slagter, D. Kruise, R. Parag, K. Steensma, K. Slooten, J.H.A. Nagel, P. Dieltjes, V. van Marion, H. van Paassen, J. de Jong, C. Creeten, T. Sijen, A.L.J. Kneppers. DNAxs/DNAStatistX: Development and validation of a software suite for the data management and probabilistic interpretation of DNA profiles. Forensic Sci. Int. Genet. 42 (2019) 81-89.
- C.C.G. Benschop, J. van der Linden, J. Hoogenboom, R. Ypma, H. Haned. Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach. Forensic Sci. Int. Genet. 43 (2019) 102150.
Link to the RFC19 NOC model: https://github.com/JenniferVdL/NOCmodel