Text Based Deception Detection Using a Hashing Algorithm and Machine Learning Techniques


  • Fahad Abdulridha College of Engineering, Al-Iraqia University, Saba’a Abkar Complex, Baghdad, Iraq
  • Baraa M. Albaker College of Engineering, Al-Iraqia University, Saba’a Abkar Complex, Baghdad, Iraq




Deception Detection; Lying Detection; Machine Learning; text classification


One of the most challenging goals in the fields of law enforcement and the court rooms is the ability to detect deception and false information, this is due to the major role it has on national security and the justice system. One of the most important ways recent literatures has explored to overcome this challenge, is detecting deception using artificial intelligence techniques to find patterns in verbal and nonverbal features. Text based analysis has been one of the most important modals for this task since written text can be found in transcribed audio, emails, online messaging services, news articles, and many more. In this work, a combination of machine learning techniques and data processing using hashing algorithms is used applied to n-gram feature representation on two of the largest datasets in deception detection field. Together, results of up to %94.59 accuracy were achieved. The paper reviews the most common techniques used in recent literature, it also details the methodology followed for data processing and model training to achieve these results.


P. V. Trovillo, “A history of lie detection,” J. Crim. Law Criminol., vol. 29, 30, pp. 848–881, 104–119, 1939.

J. Synnott, D. Dietzel, and M. Ioannou, “A review of the polygraph: history, methodology and current status,” Crime Psychol. Rev., vol. 1, no. 1, pp. 59–83, Jan. 2015, doi: 10.1080/23744006.2015.1060080.

P. Ekman and M. O’Sullivan, “Who can catch a liar?,” Am. Psychol., vol. 46, pp. 913–920, 1991, doi: 10.1037/0003-066X.46.9.913.

M. Abouelenien, V. Pérez-Rosas, R. Mihalcea, and M. Burzo, “Deception detection using a multimodal approach,” in Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul Turkey: ACM, Nov. 2014, pp. 58–65. doi: 10.1145/2663204.2663229.

L. Saxe, D. Dougherty, and T. Cross, “The validity of polygraph testing: Scientific analysis and public controversy,” Am. Psychol., vol. 40, pp. 355–366, 1985, doi: 10.1037/0003-066X.40.3.355.

A. Omirali, A. Shoiynbek, K. Kozhakhmet, and N. Sultanova, “A Review of Deception Detection Databases,” 2022, doi: DOI : 06.2016-67962946/2022.7658.

M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards, “Lying Words: Predicting Deception from Linguistic Styles,” Pers. Soc. Psychol. Bull., vol. 29, no. 5, pp. 665–675, May 2003, doi: 10.1177/0146167203029005010.

J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, “The Development and Psychometric Properties of LIWC2007”.

V. Pérez-Rosas and R. Mihalcea, “Gender Differences in Deceivers Writing Style,” in Human-Inspired Computing and Its Applications, A. Gelbukh, F. C. Espinoza, and S. N. Galicia-Haro, Eds., in Lecture Notes in Computer Science, vol. 8856. Cham: Springer International Publishing, 2014, pp. 163–174. doi: 10.1007/978-3-319-13647-9_17.

T. Fornaciari, F. Celli, and M. Poesio, “The Effect of Personality Type on Deceptive Communication Style,” in 2013 European Intelligence and Security Informatics Conference, Aug. 2013, pp. 1–6. doi: 10.1109/EISIC.2013.8.

S. I. Levitan et al., “Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection,” in Proceedings of the Second Workshop on Computational Approaches to Deception Detection, San Diego, California: Association for Computational Linguistics, Jun. 2016, pp. 40–44. doi: 10.18653/v1/W16-0806.

K. Hugenberg, A. R. McConnell, J. W. Kunstman, E. P. Lloyd, J. C. Deska, and B. Humphrey, “Miami University Deception Detection Database,” Mar. 2017, Accessed: Mar. 23, 2023. [Online]. Available: http://sc.lib.miamioh.edu/handle/2374.MIA/6067

V. Pérez-Rosas, M. Abouelenien, R. Mihalcea, and M. Burzo, “Deception Detection using Real-life Trial Data,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, in ICMI ’15. New York, NY, USA: Association for Computing Machinery, Nov. 2015, pp. 59–66. doi: 10.1145/2818346.2820758.

T. G. Dietterich, “Ensemble Methods in Machine Learning,” in Multiple Classifier Systems, in Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2000, pp. 1–15. doi: 10.1007/3-540-45014-9_1.

R. Caruana, N. Karampatziakis, and A. Yessenalina, “An empirical evaluation of supervised learning in high dimensions,” in Proceedings of the 25th international conference on Machine learning, in ICML ’08. New York, NY, USA: Association for Computing Machinery, Jul. 2008, pp. 96–103. doi: 10.1145/1390156.1390169.

M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3133–3181, Jan. 2014.

S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence.” arXiv, Mar. 31, 2020. Accessed: Jun. 03, 2023. [Online]. Available: http://arxiv.org/abs/2002.04803

M. Rocklin, “Dask: Parallel Computation with Blocked algorithms and Task Scheduling,” presented at the Python in Science Conference, Austin, Texas, 2015, pp. 126–132. doi: 10.25080/Majora-7b98e3ed-013.




How to Cite

Abdulridha, F., & M. Albaker, B. (2024). Text Based Deception Detection Using a Hashing Algorithm and Machine Learning Techniques . Al-Iraqia Journal for Scientific Engineering Research, 3(1), 87–92. https://doi.org/10.58564/IJSER.3.1.2024.148