Synthetic dataset protects privacy in criminological research
Few public datasets are available for criminological research, especially when it comes to homicides. Privacy laws often prevent data sharing. The SENSYN-project has found a solution to this: synthetic datasets. Marieke Liem, professor of Security and Interventions, speaks about this unique innovation.
Why synthetic data in homicide research?
‘The availability of sensitive and valuable datasets concerning homicide research remains scarce. This is partly caused by researchers refusing to share their data due to privacy laws and a lack of a culture that encourages such practices, which in turn affects crime and justice policies. A lack of transparency can reduce trust in research and policies based on it. In this area of research, synthetic data can be useful, which is algorithmically generated to resemble real data but not traceable to real individuals. These can then be used to analyse crime patterns and develop police models.'
What are the results of the SENSYN-project?
'We have shown that synthesising sensitive data (such as murder data) is possible. We generated a synthetic synthetic data set accessible to anyone on a publicly accessible website, where users can create their own figures, graphs and tables.
In short, synthetic data makes it easier to share data and apply FAIR principles (meaning that data can be found, understood and used by both humans and computers), especially with sensitive information. With this technique, you can create datasets that resemble real data without sharing private information.'
Public access: what benefit does it have?
'The available dataset allows the general public to explore crime statistics. It can greatly benefit policymakers and NGOs, among others, to gain a better understanding of trends and developments in the criminal environment. The dataset is accessible on Zenodo, where users can perform their own analyses on specific crime trends.'
What are the future plans for the project, and will there be possible extensions?
'We hope to use this technique in sharing data on specific forms of (fatal) violence. In addition, now that we have a proof-of-concept, we also want to apply this technique to other sensitive data, for example patient data.'
Who played a major role in this interdisciplinary collaboration?
'The success of the SENSYN-project is partly due to the unique collaboration between different disciplines and institutes, including the Leiden Institute of Advanced Computer Science (LIACS), the Leiden University Medical Centre (LUMC) and the Institute of Security and Global Affairs (ISGA). Interdisciplinary cooperation and interdisciplinary thinking do not happen automatically: it is hard work, you have to learn to speak each other's ‘language’, know each other's vocabulary and each other's way of thinking. It is hugely inspiring to work with so much talent in the university. You learn that colleagues from different disciplines look at your field in a very fresh, different way. Which only makes your own research better.'
Text: Job Van de Waeter