Predicting Crime Rates Using Demographic Data and Features Derived From Social Media

Mr Tony Moriarty1, Mr Richard Nichol1, Mr Praveen Kumar1, Mr Chao Sun2, Dr Roman Marchant3
1The University of Sydney, Sydney, Australia, 2Sydney Informatics Hub, The University of Sydney | Faculty of Arts and Social Science, The University of Sydney, Sydney, Australia, 3Centre for Translational Data Science, The University of Sydney, Sydney, Australia

Social media is a recent phenomenon whose usage pattern is constantly evolving, presenting an interesting challenge and an opportunity to enhance the analysis of patterns of crime. Social media features that may be predictive of crime rate include those derived directly from the text used in social media posts, and text-independent features derived from metadata or data aggregations which may indicate transient and shifting population characteristics not captured in static demographic statistics.

In this study we examine associations between social media and crime and attempt to predict the rates of certain categories of crime in NSW local government areas (LGAs).

We use Natural Language Processing techniques to analyse the Tracking Infrastructure for Social Media Analysis (TRISMA) historical twitter data from 2016, focusing on aggregating tweets by 130 LGAs. Augmenting the models with demographic information obtained from the Australian Bureau of Statistics 2016 Census, we model our target data, crime statistics for 6 crimes by LGA (2016); from NSW Bureau of Crime Statistics and Research (BOCSAR, 2018).

We find that a spatial model based entirely on Twitter data is predictive of crime rate across all 6 crime categories we analyse. We further find evidence that Twitter derived features may be used to enhance the accuracy of crime rate predictions for some crime categories through an ensemble result which improves on a demographic model in 4 of 6 crimes of which 2 are significant.


From fraud detection to predicting customer behaviour, Tony Moriarty worked on these projects and more in his former life as a Machine Learning Engineer at a Fortune 500. During this time he devised a system for ranking job candidates, as well as using behavioural analysis and outlier detection to identify employees stealing intellectual property. His Masters of Data Science thesis shifted his focus more broadly to quantitative criminology.

He is currently co-founder of a startup performing data mining in the real estate sector.

Recent Comments
    Recent Comments