Revue d'Information Scientifique et Technique

GigaBERT-based Approach for Hate Speech Detection in Arabic Twitter

Natural Language Processing has recently become one of the most trending research areas in Artificial Intelligence, especially in social media-related tasks. This paper describes our participation in the « Hate Speech Detection on Arabic Twitter” task at the CERIST NLP-Challenge 2022 competition. The proposed solution aims to classify the tweets collected in the Arabic ARACOVID19-MFH multi-label and multi-dialect dataset into « Hateful » and « Not Hateful » categories. Based on a pre-trained transformer model known as GigaBERT-v4, our solution outperformed the most common transformer models supporting the Arabic language. Experiments have proved that the GigaBERT-v4 model is more effective than the other models using the previously described dataset, obtaining a 99.46% accuracy and a 98.68% macro F1-score.

Auteurs : Bachir Said  , Mohammed E. Barmati

Téléchargement : PDF

Catégorie : Non classé