Structural Analysis of URL For Malicious URL Detection Using Machine Learning

Authors

  • A. Saleem Raja Information Technology Department, University of Technology and Applied Sciences, Sultanate of Oman.
  • S. Peerbasha Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.
  • Y. Mohammed Iqbal Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.
  • B. Sundarvadivazhagan Information Technology Department, University of Technology and Applied Sciences, Sultanate of Oman.
  • M. Mohamed Surputheen Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.

DOI:

https://doi.org/10.46947/joaasr542023679

Keywords:

Malicious Link, Phishing, Natural Language Processing, Machine learning, ngram, Random Forest, Lexical features of URL, TFIDF vectorizer, Count vectorizer , Hashing vectorizer

Abstract

Malicious websites are intentionally created websites that aid online criminals in carrying out illicit actions. They commit crimes like installing malware on the victim's computer, stealing private data from the victim's system, and exposing the victim online. Malicious codes can also be found on legitimate websites. Therefore, locating such a website in cyberspace is a difficult operation that demands the utilization of an automated detection tool. Currently, machine learning/deep learning technologies are employed to detect such malicious websites. However, the problem persists since the attack vector is constantly changing. Most research solutions use a limited number of URL lexical features, DNS information, global ranking information, and webpage content features. Combining several derived features involves computation time and security risk. Additionally, the dataset's minimal features don't maximize its potential. This paper exclusively uses URLs to address this problem and blends linguistic and vectorized URL features. Complete potential of the URL is utilized through vectorization. Six machine learning algorithms are examined. The results indicate that the proposed approach performs better for the count vectorizer with random forest algorithm

Metrics

Metrics Loading ...

Downloads

Published

2023-07-24

How to Cite

A. Saleem Raja, S. Peerbasha, Y. Mohammed Iqbal, B. Sundarvadivazhagan, & M. Mohamed Surputheen. (2023). Structural Analysis of URL For Malicious URL Detection Using Machine Learning. JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 5(4), 28–41. https://doi.org/10.46947/joaasr542023679