A Comprehensive Review of Machine and Deep Learning Approaches for Cyber Security Phishing Email Detection

Sarmad Rashed; Caner Ozcan

doi:10.58564/IJSER.3.3.2024.219

Authors

Sarmad Rashed * Department of Computer Engineering, Faculty of Engineering, Karabük University, Türkiye
Caner Ozcan Department of Software Engineering, Faculty of Engineering, Karabük University, Türkiye

DOI:

https://doi.org/10.58564/IJSER.3.3.2024.219

Keywords:

Bagging Techniques, Ensemble learning, Particle swarm optimization algorithm, Phishing, Random Forests

Abstract

Over the past fifteen years, phishing has emerged as the leading cybercriminal activity, resulting in the unauthorized acquisition of substantial financial resources amounting to billions of dollars. This phenomenon arises due to using novel (zero-day) and complicated tactics by phishing attackers to deceive internet users. Email is the primary approach utilized to initiate phishing attacks. This study comprehensively analyzes popular methods used in email spam tests. The present analysis comprehensively examines the key concepts, techniques, and research trends relative to spam filtering. The topic of discussion involved a general email spam filtering mechanism and the attempts of various scholars to counter spam by employing machine-learning methodologies. Our review examines the advantages and disadvantages of several machine learning methods within the context of spam filtering while addressing some of the biggest research inquiries in this domain.

References

Mohamed, G., Visumathi, J., Mahdal, M., Anand, J., & Elangovan, M. (2022). An Effective and Secure Mechanism for Phishing Attacks Using a Machine Learning Approach. Processes, 10(7), 1356. https://doi.org/10.3390/pr10071356.

Ojewumi, T. O., Ogunleye, G. O., Oguntunde, B. O., Folorunsho, O., Fashoto, S. G., & Ogbu, N. (2022). Performance evaluation of machine learning tools for detection of phishing attacks on web pages. Scientific African, 16, e01165. https://doi.org/10.1016/j.sciaf.2022.e01165

Al-Ahmadi, S., Alotaibi, A., & Alsaleh, O. (2022). PDGAN: Phishing Detection With Generative Adversarial Networks. IEEE Access, 10, 42459–42468. https://doi.org/10.1109/access.2022.3168235

Salahdine, F., El Mrabet, Z., & Kaabouch, N. (2021). Phishing Attacks Detection A Machine Learning-Based Approach. 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). https://doi.org/10.1109/uemcon53757.2021.9666627

Aljabri, M., & Mirza, S. (2022). Phishing Attacks Detection using Machine Learning and Deep Learning Models. 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA). https://doi.org/10.1109/cdma54072.2022.00034.

Magdy, S., Abouelseoud, Y., & Mikhail, M. (2022). Efficient spam and phishing emails filtering based on deep learning. Computer Networks, 206, 108826. https://doi.org/10.1016/j.comnet.2022.108826

Rajab, K. D. (2017). New Hybrid Features Selection Method: A Case Study on Websites Phishing. Security and Communication Networks, 2017, 1–10. https://doi.org/10.1155/2017/9838169

Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0786-3

Khonji, M., Iraqi, Y., & Jones, A. (2013). Phishing Detection: A Literature Survey. IEEE Communications Surveys & Tutorials, 15(4), 2091–2121. https://doi.org/10.1109/surv.2013.032213.00009

Bountakas, P., & Xenakis, C. (2023). HELPHED: Hybrid Ensemble Learning PHishing Email Detection. Journal of Network and Computer Applications, 210, 103545. https://doi.org/10.1016/j.jnca.2022.103545

Barraclough, P. A., Fehringer, G., & Woodward, J. (2021). Intelligent cyber-phishing detection for online. Computers & Security, 104, 102123. https://doi.org/10.1016/j.cose.2020.102123

Li, Q., Cheng, M., Wang, J., & Sun, B. (2022). LSTM Based Phishing Detection for Big Email Data. IEEE Transactions on Big Data, 8(1), 278–288. https://doi.org/10.1109/tbdata.2020.2978915

Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Security, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414

Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40, 23–37. https://doi.org/10.1016/j.cose.2013.10.004

Li, W., Meng, W., Tan, Z., & Xiang, Y. (2019). Design of multi-view based email classification for IoT systems via semi-supervised learning. Journal of Network and Computer Applications, 128, 56–63. https://doi.org/10.1016/j.jnca.2018.12.002

Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Security, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414

Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6), e01802. https://doi.org/10.1016/j.heliyon.2019.e01802

Smadi, S., Aslam, N., & Zhang, L. (2018). Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decision Support Systems, 107, 88–102. https://doi.org/10.1016/j.dss.2018.01.001

Mageshkumar, N., Vijayaraj, A., Arunpriya, N., & Sangeetha, A. (2022). Efficient spam filtering through intelligent text modification detection using machine learning. Materials Today: Proceedings, 64, 848–858. https://doi.org/10.1016/j.matpr.2022.05.364

El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs. IEEE Access, 8, 22170–22192. https://doi.org/10.1109/access.2020.2969780

Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., & Spyropoulos, C. D. (2000). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/345508.345569

Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., & Stamatopoulos, P. (2001, June 19). Stacking classifiers for anti-spam filtering of e-mail. arXiv.org. https://arxiv.org/abs/cs/0106040

Androutsopoulos I. Paliouras G. Michelakis E. Learning to filter unsolicited commercial e-mail (Vol. 2004). National Center for Scientific Research, 2004.

Bayes Theorem. (2021). Encyclopedia of Evolutionary Psychological Science, 522–522. https://doi.org/10.1007/978-3-319-19650-3_300417

Koprinska, I., Poon, J., Clark, J., & Chan, J. (2007). Learning to classify e-mail. Information Sciences, 177(10), 2167–2187. https://doi.org/10.1016/j.ins.2006.12.005

Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007). A comparison of machine learning techniques for phishing detection. Proceedings of the Anti-Phishing Working Groups 2nd Annual ECrime Researchers Summit. https://doi.org/10.1145/1299015.1299021

Available at, CS Mining Group, 2010, http://www.csmining.org/index.php/malicious-software-datasets-.html.

Yan Gao, Ming Yang, Xiaonan Zhao, Bryan Pardo, Ying Wu, Pappas, T. N., & Alok Choudhary. (2008). Image spam hunter. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/icassp.2008.4517972

Tastemir, B. B., Malikova, F. U., & Aitbayeva, R. B. (2022). RANDOM FORESTS MACHINE LEARNING TECHNIQUE FOR EMAIL SPAM FILTERING. SERIES PHYSICO-MATHEMATICAL, 2(342), 130–141. https://doi.org/10.32014/2022.2518-1726.134

Rapacz, S., Chołda, P., & Natkaniec, M. (2021). A Method for Fast Selection of Machine-Learning Classifiers for Spam Filtering. Electronics, 10(17), 2083. https://doi.org/10.3390/electronics10172083

Biggio, B., Fumera, G., Pillai, I., & Roli, F. (2008, August). Improving image spam filtering using image text features. In Proc of the fifth conf on email and anti-spam.

Seewald, A. K. (2007). An evaluation of Naive Bayes variants in content-based learning for spam filtering. Intelligent Data Analysis, 11(5), 497–524. https://doi.org/10.3233/ida-2007-11505

Utari, M. I., & Medyawati, H. (2020). CLASSIFICATION OF NEWS TYPES BY IMPLEMENTING ENHANCED CONFIX STRIPPING STEMMER. International Journal of Engineering Technologies and Management Research, 6(5), 135–141. https://doi.org/10.29121/ijetmr.v6.i5.2019.380

DENG, W., & HONG, Z. (2010). Double-stage spam filtering method based on rough set. Journal of Computer Applications, 30(8), 2006–2009. https://doi.org/10.3724/sp.j.1087.2010.02006

Ray, S. (6). Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R,” 2017.

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer New York. https://doi.org/10.1007/978-1-4757-2440-0

Liu, W., & Wang, T. (2010). Index-based Online Text Classification for SMS Spam Filtering. Journal of Computers, 5(6). https://doi.org/10.4304/jcp.5.6.844-851

Gabriel, A. O., & Joy, A. A. (2022). An Email Spam Filtering Model Using Ensemble of Machine Learning Techniques. International Journal of Computer Applications Technology and Research, 11(03), 66–71. https://doi.org/10.7753/ijcatr1103.1003

Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of Phishing Email Using Random Forest Machine Learning Technique. Journal of Applied Mathematics, 2014, 1–6. https://doi.org/10.1155/2014/425731

L. Breiman, A. Cutler, Random Forests-Classification Description, Department of Statistics Homepage, 2007. http://www.stat.berkeley.edu/~breiman/RandomFo rests/cchome.htm.

Fette, I., Sadeh, N., & Tomasic, A. (2006). Learning to Detect Phishing Emails. Defense Technical Information Center. https://doi.org/10.21236/ada456046

Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2, e488. https://doi.org/10.7717/peerj.488

P Guerra, P. H. C., Guedes, D., Meira, J. W., Hoepers, C., Chaves, M. H. P. C., & Steding-Jessen, K. (2010, July). Exploring the spam arms race to characterize spam evolution. In Proceedings of the 7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), Redmond, WA.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/bf00058655

Random Forests. (2017). Encyclopedia of Machine Learning and Data Mining, 1054–1054. https://doi.org/10.1007/978-1-4899-7687-1_695

Qiu, Y.-Z. (2023). Universal adversarial perturbations for multiple classification tasks with quantum classifiers. Machine Learning: Science and Technology, 4(4), 045009. https://doi.org/10.1088/2632-2153/acffa3

Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2). https://doi.org/10.1214/aos/1016218223

El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs. IEEE Access, 8, 22170–22192. https://doi.org/10.1109/access.2020.2969780

Sankhwar, S., Pandey, D., & Khan, R. A. (2019). Email Phishing: An Enhanced Classification Model to Detect Malicious URLs. ICST Transactions on Scalable Information Systems, 6(21), 158529. https://doi.org/10.4108/eai.13-7-2018.158529

Talaei Pashiri, R., Rostami, Y., & Mahrami, M. (2020). Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Mathematical Sciences, 14(3), 193–199. https://doi.org/10.1007/s40096-020-00327-8

Deepika Mallampati. (2018). An Efficient Spam Filtering using Supervised Machine Learning Techniques. International Journal of Scientific Research in Computer Science and Engineering, 6(2), 33–37. https://doi.org/10.26438/ijsrcse/v6i2.3337

Gangavarapu, T., Jaidhar, C. D., & Chanduka, B. (2020). Applicability of machine learning in spam and phishing email filtering: review and approaches. Artificial Intelligence Review, 53(7), 5019–5081. https://doi.org/10.1007/s10462-020-09814-9

Singh, M. C., Sumanth, P., Sathyanarayana, S. B., & Rithika, G. (2022). Phishing email detection using deep learning algorithms. International Journal of Health Sciences, 8130–8139. https://doi.org/10.53730/ijhs.v6ns3.7944

Alghamdi, J., Lin, Y., & Luo, S. (2022). A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection. Information, 13(12), 576. https://doi.org/10.3390/info13120576

Singh, M. C., Sumanth, P., Sathyanarayana, S. B., & Rithika, G. (2022). Phishing email detection using deep learning algorithms. International Journal of Health Sciences, 8130–8139. https://doi.org/10.53730/ijhs.v6ns3.7944

AbdulNabi, I., & Yaseen, Q. (2021). Spam Email Detection Using Deep Learning Techniques. Procedia Computer Science, 184, 853–858. https://doi.org/10.1016/j.procs.2021.03.107

Sakunthala Jenni, R., & Shankar, S. (2022). Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection. Computer Systems Science and Engineering, 41(2), 525–538. https://doi.org/10.32604/csse.2022.019300

Palanichamy, N., & Murti, Y. S. (2023). Improving Phishing Email Detection Using the Hybrid Machine Learning Approach. Journal of Telecommunications and the Digital Economy, 11(3), 120–142. https://doi.org/10.18080/jtde.v11n3.778

Abdulraheem, R., Odeh, A., Al Fayoumi, M., & Keshta, I. (2022). Efficient Email phishing detection using Machine learning. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). https://doi.org/10.1109/ccwc54503.2022.9720818

M, M., & Godara, S. (2019). Analysis of various Machine Learning Techniques to Detect Phishing Email. International Journal of Computer Applications, 178(38), 4–12. https://doi.org/10.5120/ijca2019919251

Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics, 12(1), 232. https://doi.org/10.3390/electronics12010232

Venkat, D. S. S. P. (2023). Improved Phishing Detection using Ensemble Models in Machine Learning. International Journal for Research in Applied Science and Engineering Technology, 11(6), 3616–3620. https://doi.org/10.22214/ijraset.2023.54359

Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504

Zhao, S., Xu, Z., Liu, L., Guo, M., & Yun, J. (2018). Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN. Mathematical Problems in Engineering, 2018, 1–9. https://doi.org/10.1155/2018/2410206

Vinayakumar, R., Soman, K. P., Poornachandran, P., Mohan, V. S., & Kumar, A. D. (2018). ScaleNet: Scalable and Hybrid Framework for Cyber Threat Situational Awareness Based on DNS, URL, and Email Data Analysis. Journal of Cyber Security and Mobility. https://doi.org/10.13052/2245-1439.823

Castillo, E., Dhaduvai, S., Liu, P., Thakur, K. S., Dalton, A., & Strzalkowski, T. (2020, May). Email threat detection using distinct neural network approaches. In Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management (pp. 48-55).

Lee, Y., Saxe, J., & Harang, R. (2020). Catbert: Context-aware tiny bert for detecting social engineering emails. arXiv preprint arXiv:2010.03484.

Vinayakumar, R., Soman, K. P., Prabaharan Poornachandran, Akarsh, S., & Elhoseny, M. (2019). Deep Learning Framework for Cyber Threat Situational Awareness Based on Email and URL Data Analysis. Cybersecurity and Secure Information Systems, 87–124. https://doi.org/10.1007/978-3-030-16837-7_6

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Jen Tzung. C. (2019). Chapter 7. Deep Neural Network, Source Separation and Machine Learning, Academic Press. Pages 259-320, ISBN 9780128177969.

Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.

Zhang, L., Xu, Z., Lu, Z., & Wang, S. (2020). An efficient deep convolutional network for email spam classification. IEEE Access, 8, 131617-131626.

A Comprehensive Review of Machine and Deep Learning Approaches for Cyber Security Phishing Email Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

ADDITIONAL MENU

INDEXING

Statistics

Information

Make a Submission