Hateful speech spread through social media has the potential to cause personal harm and suffering as well as social tension. Social media platforms, on the other hand, are unable to regulate all of the content that users post. As a result,
there is a demand for automatic detection of hate speech. This demand is increased when the posts are written in complex languages, such as Arabic. This present study is dedicated to contributing to hate speech and offensive language detection tasks for Arabic dialects. This paper is about my participation on CERIST Natural Language Processing Challenge 2022.
We propose an approach based on deep learning and a pre-trained BERT model. This approach is built by adding GRU and LSTM layers to BERT outputs. Additionally, to deal with the class imbalance issue in the dataset, two methods are proposed, the first is based on data augmentation by oversampling minority class using translation and back translation method and the second uses focal loss for training. The best results reached with focal loss training are 98.03% for accuracy and 98.02% for f1-score, and with data augmentation, 99.14% for both accuracy and f1-score.
Auteurs : Nourelhouda Chiker
Téléchargement : PDF