Definition of Phishing Sites Based on the Team Model of Fuzzy Neural Networks

Ilyas Idrisovich Ismagilov; Aynur Ayratovich Murtazin; Dina Vladimirovna Kataseva; Alexey Sergeevich Katasev; Andrey Igorevich Barinov

PDF

Published: Oct 31, 2020

Keywords:

Phishing Site, Knowledge Base, Fuzzy Neural Network, Team Model of Fuzzy Neural Networks, Classifier, Data Mining

Ilyas Idrisovich Ismagilov

Department of Economic Theory and Econometrics, Institute of Management, Economics and Finance, Kazan Federal University

Aynur Ayratovich Murtazin

Institute of Management, Economics and Finance, Kazan Federal University

Dina Vladimirovna Kataseva

Department of Information Security Systems, Institute of Computer Technologies and Information Security, Kazan National Research Technical University named after A.N. Tupolev "KAI"

Alexey Sergeevich Katasev

Department of Information Security Systems, Institute of Computer Technologies and Information Security, Kazan National Research Technical University named after A.N. Tupolev "KAI"

Andrey Igorevich Barinov

Department of Information Security Systems, Institute of Computer Technologies and Information Security, Kazan National Research Technical University named after A.N. Tupolev "KAI"

Abstract

This paper solves the problem of defining phishing sites based on building a team model of fuzzy neural networks (FNNs). The main methods of phishing are analyzed. Attention is drawn to the fact that phishing has become widespread on the Internet through the use of phishing sites. The expediency of identifying phishing sites based on the analysis of their URLs is noted. The main approaches to identifying phishing sites are described. The need to
implement an approach based on machine learning by constructing fuzzy neural networks for the creation of fuzzy knowledge bases and their use to identify phishing sites is actualized. Automating the identification of phishing sites based on the neuro-fuzzy approach required solving the problems of collecting and preparing initial data for analysis,
building a team model of fuzzy neural networks, and forming a fuzzy knowledge base, as well as conducting research, and assessing the accuracy of identifying phishing sites based on the constructed model. The initial data was formed from various sources. The total amount of initial data was 50,000. Of these, 10 input features for analysis were selected by an expert. After carrying out the correlation analysis, 4 most informative input features were selected for analysis: site lifetime, site rank, URL length, and the registered status of the site. An output feature of the site was its type: phishing or legitimate. After assessing the quality and cleaning the selected data, the resulting sample was formed of 34718 rows, of which 70% were used for learning (24303 rows), and 30% (10415 rows) for testing. A team model of
fuzzy neural networks was built and a knowledge base was formed on the basis of the data obtained, including 4608 fuzzy rules. Studies have shown that the number of errors of the 1st type in identifying phishing sites is 2.01%, and 2.89% for errors of the 2nd type. The general classification error based on knowledge base rules is 4.9%. The accuracy of identifying phishing sites was 95.1%, which exceeds the accuracy of other classification methods: multilayer neural network, decision tree, linear and logistic regression. The knowledge base formed on the basis of the team model of fuzzy neural networks can be effectively used to identify phishing sites on the Internet.

How to Cite

Ilyas Idrisovich Ismagilov, Aynur Ayratovich Murtazin, Dina Vladimirovna Kataseva, Alexey Sergeevich Katasev, & Andrey Igorevich Barinov. (2020). Definition of Phishing Sites Based on the Team Model of Fuzzy Neural Networks. Helix - The Scientific Explorer | Peer Reviewed Bimonthly International Journal, 10(05), 133-140. Retrieved from https://helixscientific.pub/index.php/home/article/view/237

Issue

Vol. 10 No. 05 (2020): Volume No 10 Issue No 05

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details