Structural Analysis of URL For Malicious URL Detection Using Machine Learning


  • A. Saleem Raja Information Technology Department, University of Technology and Applied Sciences, Sultanate of Oman.
  • S. Peerbasha Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.
  • Y. Mohammed Iqbal Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.
  • B. Sundarvadivazhagan Information Technology Department, University of Technology and Applied Sciences, Sultanate of Oman.
  • M. Mohamed Surputheen Department of Computer Science, Jamal Mohamed College, Affiliated to Bharathidasan University, Trichy, Tamilnadu, India.



Malicious Link, Phishing, Natural Language Processing, Machine learning, ngram, Random Forest, Lexical features of URL, TFIDF vectorizer, Count vectorizer , Hashing vectorizer


Malicious websites are intentionally created websites that aid online criminals in carrying out illicit actions. They commit crimes like installing malware on the victim's computer, stealing private data from the victim's system, and exposing the victim online. Malicious codes can also be found on legitimate websites. Therefore, locating such a website in cyberspace is a difficult operation that demands the utilization of an automated detection tool. Currently, machine learning/deep learning technologies are employed to detect such malicious websites. However, the problem persists since the attack vector is constantly changing. Most research solutions use a limited number of URL lexical features, DNS information, global ranking information, and webpage content features. Combining several derived features involves computation time and security risk. Additionally, the dataset's minimal features don't maximize its potential. This paper exclusively uses URLs to address this problem and blends linguistic and vectorized URL features. Complete potential of the URL is utilized through vectorization. Six machine learning algorithms are examined. The results indicate that the proposed approach performs better for the count vectorizer with random forest algorithm


Metrics Loading ...




How to Cite

A. Saleem Raja, S. Peerbasha, Y. Mohammed Iqbal, B. Sundarvadivazhagan, & M. Mohamed Surputheen. (2023). Structural Analysis of URL For Malicious URL Detection Using Machine Learning. JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 5(4), 28–41.