A Comprehensive Review of Machine and Deep Learning Approaches for Cyber Security Phishing Email Detection
DOI:
https://doi.org/10.58564/IJSER.3.3.2024.219Keywords:
Bagging Techniques, Ensemble learning, Particle swarm optimization algorithm, Phishing, Random ForestsAbstract
Over the past fifteen years, phishing has emerged as the leading cybercriminal activity, resulting in the unauthorized acquisition of substantial financial resources amounting to billions of dollars. This phenomenon arises due to using novel (zero-day) and complicated tactics by phishing attackers to deceive internet users. Email is the primary approach utilized to initiate phishing attacks. This study comprehensively analyzes popular methods used in email spam tests. The present analysis comprehensively examines the key concepts, techniques, and research trends relative to spam filtering. The topic of discussion involved a general email spam filtering mechanism and the attempts of various scholars to counter spam by employing machine-learning methodologies. Our review examines the advantages and disadvantages of several machine learning methods within the context of spam filtering while addressing some of the biggest research inquiries in this domain.
References
Mohamed, G., Visumathi, J., Mahdal, M., Anand, J., & Elangovan, M. (2022). An Effective and Secure Mechanism for Phishing Attacks Using a Machine Learning Approach. Processes, 10(7), 1356. https://doi.org/10.3390/pr10071356.
Ojewumi, T. O., Ogunleye, G. O., Oguntunde, B. O., Folorunsho, O., Fashoto, S. G., & Ogbu, N. (2022). Performance evaluation of machine learning tools for detection of phishing attacks on web pages. Scientific African, 16, e01165. https://doi.org/10.1016/j.sciaf.2022.e01165
Al-Ahmadi, S., Alotaibi, A., & Alsaleh, O. (2022). PDGAN: Phishing Detection With Generative Adversarial Networks. IEEE Access, 10, 42459–42468. https://doi.org/10.1109/access.2022.3168235
Salahdine, F., El Mrabet, Z., & Kaabouch, N. (2021). Phishing Attacks Detection A Machine Learning-Based Approach. 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). https://doi.org/10.1109/uemcon53757.2021.9666627
Aljabri, M., & Mirza, S. (2022). Phishing Attacks Detection using Machine Learning and Deep Learning Models. 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA). https://doi.org/10.1109/cdma54072.2022.00034.
Magdy, S., Abouelseoud, Y., & Mikhail, M. (2022). Efficient spam and phishing emails filtering based on deep learning. Computer Networks, 206, 108826. https://doi.org/10.1016/j.comnet.2022.108826
Rajab, K. D. (2017). New Hybrid Features Selection Method: A Case Study on Websites Phishing. Security and Communication Networks, 2017, 1–10. https://doi.org/10.1155/2017/9838169
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0786-3
Khonji, M., Iraqi, Y., & Jones, A. (2013). Phishing Detection: A Literature Survey. IEEE Communications Surveys & Tutorials, 15(4), 2091–2121. https://doi.org/10.1109/surv.2013.032213.00009
Bountakas, P., & Xenakis, C. (2023). HELPHED: Hybrid Ensemble Learning PHishing Email Detection. Journal of Network and Computer Applications, 210, 103545. https://doi.org/10.1016/j.jnca.2022.103545
Barraclough, P. A., Fehringer, G., & Woodward, J. (2021). Intelligent cyber-phishing detection for online. Computers & Security, 104, 102123. https://doi.org/10.1016/j.cose.2020.102123
Li, Q., Cheng, M., Wang, J., & Sun, B. (2022). LSTM Based Phishing Detection for Big Email Data. IEEE Transactions on Big Data, 8(1), 278–288. https://doi.org/10.1109/tbdata.2020.2978915
Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Security, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414
Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers & Security, 40, 23–37. https://doi.org/10.1016/j.cose.2013.10.004
Li, W., Meng, W., Tan, Z., & Xiang, Y. (2019). Design of multi-view based email classification for IoT systems via semi-supervised learning. Journal of Network and Computer Applications, 128, 56–63. https://doi.org/10.1016/j.jnca.2018.12.002
Alhogail, A., & Alsabih, A. (2021). Applying machine learning and natural language processing to detect phishing email. Computers & Security, 110, 102414. https://doi.org/10.1016/j.cose.2021.102414
Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6), e01802. https://doi.org/10.1016/j.heliyon.2019.e01802
Smadi, S., Aslam, N., & Zhang, L. (2018). Detection of online phishing email using dynamic evolving neural network based on reinforcement learning. Decision Support Systems, 107, 88–102. https://doi.org/10.1016/j.dss.2018.01.001
Mageshkumar, N., Vijayaraj, A., Arunpriya, N., & Sangeetha, A. (2022). Efficient spam filtering through intelligent text modification detection using machine learning. Materials Today: Proceedings, 64, 848–858. https://doi.org/10.1016/j.matpr.2022.05.364
El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs. IEEE Access, 8, 22170–22192. https://doi.org/10.1109/access.2020.2969780
Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., & Spyropoulos, C. D. (2000). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/345508.345569
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., & Stamatopoulos, P. (2001, June 19). Stacking classifiers for anti-spam filtering of e-mail. arXiv.org. https://arxiv.org/abs/cs/0106040
Androutsopoulos I. Paliouras G. Michelakis E. Learning to filter unsolicited commercial e-mail (Vol. 2004). National Center for Scientific Research, 2004.
Bayes Theorem. (2021). Encyclopedia of Evolutionary Psychological Science, 522–522. https://doi.org/10.1007/978-3-319-19650-3_300417
Koprinska, I., Poon, J., Clark, J., & Chan, J. (2007). Learning to classify e-mail. Information Sciences, 177(10), 2167–2187. https://doi.org/10.1016/j.ins.2006.12.005
Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007). A comparison of machine learning techniques for phishing detection. Proceedings of the Anti-Phishing Working Groups 2nd Annual ECrime Researchers Summit. https://doi.org/10.1145/1299015.1299021
Available at, CS Mining Group, 2010, http://www.csmining.org/index.php/malicious-software-datasets-.html.
Yan Gao, Ming Yang, Xiaonan Zhao, Bryan Pardo, Ying Wu, Pappas, T. N., & Alok Choudhary. (2008). Image spam hunter. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/icassp.2008.4517972
Tastemir, B. B., Malikova, F. U., & Aitbayeva, R. B. (2022). RANDOM FORESTS MACHINE LEARNING TECHNIQUE FOR EMAIL SPAM FILTERING. SERIES PHYSICO-MATHEMATICAL, 2(342), 130–141. https://doi.org/10.32014/2022.2518-1726.134
Rapacz, S., Chołda, P., & Natkaniec, M. (2021). A Method for Fast Selection of Machine-Learning Classifiers for Spam Filtering. Electronics, 10(17), 2083. https://doi.org/10.3390/electronics10172083
Biggio, B., Fumera, G., Pillai, I., & Roli, F. (2008, August). Improving image spam filtering using image text features. In Proc of the fifth conf on email and anti-spam.
Seewald, A. K. (2007). An evaluation of Naive Bayes variants in content-based learning for spam filtering. Intelligent Data Analysis, 11(5), 497–524. https://doi.org/10.3233/ida-2007-11505
Utari, M. I., & Medyawati, H. (2020). CLASSIFICATION OF NEWS TYPES BY IMPLEMENTING ENHANCED CONFIX STRIPPING STEMMER. International Journal of Engineering Technologies and Management Research, 6(5), 135–141. https://doi.org/10.29121/ijetmr.v6.i5.2019.380
DENG, W., & HONG, Z. (2010). Double-stage spam filtering method based on rough set. Journal of Computer Applications, 30(8), 2006–2009. https://doi.org/10.3724/sp.j.1087.2010.02006
Ray, S. (6). Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R,” 2017.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer New York. https://doi.org/10.1007/978-1-4757-2440-0
Liu, W., & Wang, T. (2010). Index-based Online Text Classification for SMS Spam Filtering. Journal of Computers, 5(6). https://doi.org/10.4304/jcp.5.6.844-851
Gabriel, A. O., & Joy, A. A. (2022). An Email Spam Filtering Model Using Ensemble of Machine Learning Techniques. International Journal of Computer Applications Technology and Research, 11(03), 66–71. https://doi.org/10.7753/ijcatr1103.1003
Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of Phishing Email Using Random Forest Machine Learning Technique. Journal of Applied Mathematics, 2014, 1–6. https://doi.org/10.1155/2014/425731
L. Breiman, A. Cutler, Random Forests-Classification Description, Department of Statistics Homepage, 2007. http://www.stat.berkeley.edu/~breiman/RandomFo rests/cchome.htm.
Fette, I., Sadeh, N., & Tomasic, A. (2006). Learning to Detect Phishing Emails. Defense Technical Information Center. https://doi.org/10.21236/ada456046
Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ, 2, e488. https://doi.org/10.7717/peerj.488
P Guerra, P. H. C., Guedes, D., Meira, J. W., Hoepers, C., Chaves, M. H. P. C., & Steding-Jessen, K. (2010, July). Exploring the spam arms race to characterize spam evolution. In Proceedings of the 7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), Redmond, WA.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/bf00058655
Random Forests. (2017). Encyclopedia of Machine Learning and Data Mining, 1054–1054. https://doi.org/10.1007/978-1-4899-7687-1_695
Qiu, Y.-Z. (2023). Universal adversarial perturbations for multiple classification tasks with quantum classifiers. Machine Learning: Science and Technology, 4(4), 045009. https://doi.org/10.1088/2632-2153/acffa3
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2). https://doi.org/10.1214/aos/1016218223
El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs. IEEE Access, 8, 22170–22192. https://doi.org/10.1109/access.2020.2969780
Sankhwar, S., Pandey, D., & Khan, R. A. (2019). Email Phishing: An Enhanced Classification Model to Detect Malicious URLs. ICST Transactions on Scalable Information Systems, 6(21), 158529. https://doi.org/10.4108/eai.13-7-2018.158529
Talaei Pashiri, R., Rostami, Y., & Mahrami, M. (2020). Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Mathematical Sciences, 14(3), 193–199. https://doi.org/10.1007/s40096-020-00327-8
Deepika Mallampati. (2018). An Efficient Spam Filtering using Supervised Machine Learning Techniques. International Journal of Scientific Research in Computer Science and Engineering, 6(2), 33–37. https://doi.org/10.26438/ijsrcse/v6i2.3337
Gangavarapu, T., Jaidhar, C. D., & Chanduka, B. (2020). Applicability of machine learning in spam and phishing email filtering: review and approaches. Artificial Intelligence Review, 53(7), 5019–5081. https://doi.org/10.1007/s10462-020-09814-9
Singh, M. C., Sumanth, P., Sathyanarayana, S. B., & Rithika, G. (2022). Phishing email detection using deep learning algorithms. International Journal of Health Sciences, 8130–8139. https://doi.org/10.53730/ijhs.v6ns3.7944
Alghamdi, J., Lin, Y., & Luo, S. (2022). A Comparative Study of Machine Learning and Deep Learning Techniques for Fake News Detection. Information, 13(12), 576. https://doi.org/10.3390/info13120576
Singh, M. C., Sumanth, P., Sathyanarayana, S. B., & Rithika, G. (2022). Phishing email detection using deep learning algorithms. International Journal of Health Sciences, 8130–8139. https://doi.org/10.53730/ijhs.v6ns3.7944
AbdulNabi, I., & Yaseen, Q. (2021). Spam Email Detection Using Deep Learning Techniques. Procedia Computer Science, 184, 853–858. https://doi.org/10.1016/j.procs.2021.03.107
Sakunthala Jenni, R., & Shankar, S. (2022). Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection. Computer Systems Science and Engineering, 41(2), 525–538. https://doi.org/10.32604/csse.2022.019300
Palanichamy, N., & Murti, Y. S. (2023). Improving Phishing Email Detection Using the Hybrid Machine Learning Approach. Journal of Telecommunications and the Digital Economy, 11(3), 120–142. https://doi.org/10.18080/jtde.v11n3.778
Abdulraheem, R., Odeh, A., Al Fayoumi, M., & Keshta, I. (2022). Efficient Email phishing detection using Machine learning. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). https://doi.org/10.1109/ccwc54503.2022.9720818
M, M., & Godara, S. (2019). Analysis of various Machine Learning Techniques to Detect Phishing Email. International Journal of Computer Applications, 178(38), 4–12. https://doi.org/10.5120/ijca2019919251
Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q. E. U., Saleem, K., & Faheem, M. H. (2023). A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics, 12(1), 232. https://doi.org/10.3390/electronics12010232
Venkat, D. S. S. P. (2023). Improved Phishing Detection using Ensemble Models in Machine Learning. International Journal for Research in Applied Science and Engineering Technology, 11(6), 3616–3620. https://doi.org/10.22214/ijraset.2023.54359
Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
Zhao, S., Xu, Z., Liu, L., Guo, M., & Yun, J. (2018). Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN. Mathematical Problems in Engineering, 2018, 1–9. https://doi.org/10.1155/2018/2410206
Vinayakumar, R., Soman, K. P., Poornachandran, P., Mohan, V. S., & Kumar, A. D. (2018). ScaleNet: Scalable and Hybrid Framework for Cyber Threat Situational Awareness Based on DNS, URL, and Email Data Analysis. Journal of Cyber Security and Mobility. https://doi.org/10.13052/2245-1439.823
Castillo, E., Dhaduvai, S., Liu, P., Thakur, K. S., Dalton, A., & Strzalkowski, T. (2020, May). Email threat detection using distinct neural network approaches. In Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management (pp. 48-55).
Lee, Y., Saxe, J., & Harang, R. (2020). Catbert: Context-aware tiny bert for detecting social engineering emails. arXiv preprint arXiv:2010.03484.
Vinayakumar, R., Soman, K. P., Prabaharan Poornachandran, Akarsh, S., & Elhoseny, M. (2019). Deep Learning Framework for Cyber Threat Situational Awareness Based on Email and URL Data Analysis. Cybersecurity and Secure Information Systems, 87–124. https://doi.org/10.1007/978-3-030-16837-7_6
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Jen Tzung. C. (2019). Chapter 7. Deep Neural Network, Source Separation and Machine Learning, Academic Press. Pages 259-320, ISBN 9780128177969.
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
Zhang, L., Xu, Z., Lu, Z., & Wang, S. (2020). An efficient deep convolutional network for email spam classification. IEEE Access, 8, 131617-131626.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Sarmad Rashed, Caner Ozcan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Deprecated: json_decode(): Passing null to parameter #1 ($json) of type string is deprecated in /var/www/vhosts/ijser.aliraqia.edu.iq/httpdocs/plugins/generic/citations/CitationsPlugin.inc.php on line 49







