DNA-profiling is one of the most valuable forensic approaches: it can generate investigative leads by retrieving suspect names from DNA database searches and high weights of evidence when suspects and evidential traces are compared. The growing number of markers in profiling systems makes DNA-profile comparison increasingly time-consuming and error-prone. Software tools may change this to the better, which prompted the development of the expert system DNAxs.
A user-friendly, well-validated software that compares profiles within seconds and proceeds to statistical analysis according to advanced probabilistic models will increase the quality and number of cases that can be handled, and stimulate the use of such advanced statistics. Forensic experts can adopt a well-structured and uniform workflow and focus on the scientific aspects of the work by which the overall quality and uniformity of forensic casework will increase.
The NFI has developed a software expert system, DNAxs, for case data management, within case matching of profiles and complex DNA-profile interpretation using LoCIM inference and computation of consensus and composite profiles. The software links to other software tools such as SmartRank, Bonaparte and FDStools and CODIS. Within the software the DNAStatistX module has the continuous maximum likelihood ratio model from EuroForMix integrated to calculate the evidential value with probabilistic DNA-statistics effortlessly. The platform aids the DNA-scientists in casework by giving overview and managing the increasingly complex data interpretation and decision making process. In addition DNAxs has increased consistency and accountability, while reducing errors and interpretation variability.
The free version (2.0) of DNAxs (including DNAStatistX) is available for you. Request a copy through DNAxs@NFI.nl. You will need to sign a licence agreement before the software will be made available to you. For the standalone version of DNAStatistX also contact DNAxs@NFI.nl for a free copy.
For more information on DNAxs read our Frequently Asked Questions document (bottom of this page).
With the development of DNAxs, no further development is performed on the semi-continuous model LRmix Studio (with the exception of bug fixes). LRmix Studio is no longer available from lrmixstudio.org, but has been migrated to GitHub (https://github.com/smartrank/lrmixstudio/releases) from which the software can be downloaded.
Multi-laboratory validation of DNAxs and the statistical library DNAStatistX
DNAxs and DNAStatistX were used in a multi-laboratory validation study in which four laboratories participated. I.e. the Netherlands Forensic Institute in the Netherlands, Institute of Legal Medicine in Austria, National Forensic Laboratory in Slovenia, Institute of Legal Medicine (Cologne) in Germany, and Institut National de Police Scientifique (Ecully) in France. The study was partly funded by the European Union’s Internal Security Fund — Police (Proposal Number: 820838, Proposal Acronym: DNAxs2.0).
In this study, the software was modified to read multiple data formats. First, participants performed an exercise to explore all main functionalities of DNAxs and gave feedback on user-friendliness, installation and general performance. Next, every laboratory performed likelihood ratio (LR) calculations using their own dataset and a dataset provided by the organising laboratory (NFI). The organising laboratory performed LR calculations using all datasets. The datasets were generated with different STR typing kits or analysis systems and consisted of samples varying in DNA amounts, mixture ratios, number of contributors and drop-out level. Hypothesis sets had the correct, under- and over-assigned number of contributors and true and false donors as person of interest.
When comparing the results between laboratories, the LRs were foremost within the pre-set range of variation. The few LR results that deviated more had differences for the parameters estimated by the optimizer within DNAStatistX. Some of these were indicated by failed iteration results, others by a failed model validation, since unrealistic hypotheses were included. When these results that do not meet the quality criteria were excluded, as is in accordance with interpretation guidelines, none of the analyses in the different laboratories yielded a different statement in the casework report. Nonetheless, changes in software parameters were sought that minimized differences in outcomes, which made the DNAStatistX module more robust.
Overall, the software was found intuitive, user-friendly and valid for use in multiple laboratories. The dataset of the organising laboratory is provided to aid the implementation of the DNAxs/DNAStatistX software within other laboratories. This test dataset can be downloaded here.
- Flyer DNAxs workshops (see documents below)
- Announcements of new DNAxs workshops will be done through this website.
Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach
The number of contributors (NOC) to (complex) autosomal STR profiles cannot be determined with absolute certainty due to complicating factors such as allele sharing and allelic drop-out. The precision of NOC estimations can be improved by increasing the number of (highly polymorphic) markers, the use of massively parallel sequencing instead of capillary electrophoresis, and/or using more profile information than only the allele counts. In a recent study, we focussed on machine learning approaches in order to make maximum use of the profile information. To this end, a set of 590 PowerPlex® Fusion 6C profiles with one up to five contributors were generated from a total of 1174 different donors.
This set varied for the template amount of DNA, mixture proportion, levels of allele sharing, allelic drop-out and degradation. The dataset was split into a training, test and hold-out set. The training set contained labels with the known NOC and was used to train and optimize ten different algorithms with selection of profile characteristics. Per profile, over 250 characteristics, denoted ‘features’, were calculated. These features were based on allele counts, peak heights and allele frequencies.
The features that were most related to the NOC were selected based on partial correlation using the training set. Next, the performance of each model (=combination of features plus algorithm) was examined using the test set. A random forest classifier with 19 features, denoted the ‘RFC19-model’ showed best performance and was selected for further validation.
Results showed improved accuracy compared to the conventional maximum allele count approach and an in-house nC-tool based on the total allele count. The method is extremely fast and regarded useful for application in forensic casework. The RFC19 machine learning model can be downloaded via https://github.com/JenniferVdL/NOCmodel and will be implemented in our DNA eXpert System, DNAxs.
- C.C.G. Benschop, J. Hoogenboom, P. Hovers, M. Slagter, D. Kruise, R. Parag, K. Steensma, K. Slooten, J.H.A. Nagel, P. Dieltjes, V. van Marion, H. van Paassen, J. de Jong, C. Creeten, T. Sijen, A.L.J. Kneppers. DNAxs/DNAStatistX: Development and validation of a software suite for the data management and probabilistic interpretation of DNA profiles. Forensic Sci. Int. Genet. 42 (2019) 81-89.
- C.C.G. Benschop, J. van der Linden, J. Hoogenboom, R. Ypma, H. Haned. Automated estimation of the number of contributors in autosomal short tandem repeat profiles using a machine learning approach. Forensic Sci. Int. Genet. 43 (2019) 102150.
- C.C.G. Benschop, A. Nijveld, F.E. Duijs, T. Sijen, An assessment of the performance of the probabilistic genotyping software EuroForMix: trends in likelihood ratios and analysis of Type I & II errors, Forensic Sci. Int. Genet. 42 (2019) 31–38.
Download the PowerPlex Fusion 6C 2p-5p mixtures dataset as used in this study.
Link to the RFC19 NOC model: https://github.com/JenniferVdL/NOCmodel