GigaBERT-based Approach for Hate Speech Detection in Arabic Twitter

Natural Language Processing has recently become one of the most trending research areas in Artificial Intelligence, especially in social media-related tasks. This paper describes our participation in the « Hate Speech Detection on Arabic Twitter” task at the CERIST NLP-Challenge 2022 competition. The proposed solution aims to classify the tweets collected in the Arabic ARACOVID19-MFH multi-label and multi-dialect dataset into « Hateful » and « Not Hateful » categories. Based on a pre-trained transformer model known as GigaBERT-v4, our solution outperformed the most common transformer models supporting the Arabic language. Experiments have proved that the GigaBERT-v4 model is more effective than the other models using the previously described dataset, obtaining a 99.46% accuracy and a 98.68% macro F1-score.

Auteurs : Bachir Said , Mohammed E. Barmati

Téléchargement : PDF

XLM-T for Multilingual Sentiment Analysis in Twitter using oversampling technique

With the emergence of Pre-trained Language Models (PLMs) and the success of large scale, the field of Natural Language Processing (NLP) has achieved tremendous development such as Sentiment analysis (SA) that is one of the fast-growing research tasks in NLP. This paper describes the system that our team submitted to the CERIST NLP Challenge, for task 1.b. The purpose of this task is to identify the sentiment polarity of the datasets in English and Arabic languages comments collected from twitter. Our approach is based on a PL Model called XLM-T, and uses the Oversampling technique to solve the sentiment analysis problem of multilingualism in twitter. Experimental results confirm that this state-of-the-art model is robust achieving accuracy of 85%.

Auteurs : Mohammed E. Barmati , Bachir Said

Téléchargement : PDF

Hate speech detection model based on BERT for the Arabic dialects

Hateful speech spread through social media has the potential to cause personal harm and suffering as well as social tension. Social media platforms, on the other hand, are unable to regulate all of the content that users post. As a result,
there is a demand for automatic detection of hate speech. This demand is increased when the posts are written in complex languages, such as Arabic. This present study is dedicated to contributing to hate speech and offensive language detection tasks for Arabic dialects. This paper is about my participation on CERIST Natural Language Processing Challenge 2022.
We propose an approach based on deep learning and a pre-trained BERT model. This approach is built by adding GRU and LSTM layers to BERT outputs. Additionally, to deal with the class imbalance issue in the dataset, two methods are proposed, the first is based on data augmentation by oversampling minority class using translation and back translation method and the second uses focal loss for training. The best results reached with focal loss training are 98.03% for accuracy and 98.02% for f1-score, and with data augmentation, 99.14% for both accuracy and f1-score.

Auteurs : Nourelhouda Chiker

Téléchargement : PDF

Modeling Fake News Detection Using Machine Learning Algorithms for Arabic covid-19 Tweets

Fake news detection has become a major issue in the digital age, with social media playing a major role in its spread. This paper outlines the dataset and methodology used to model Arabic fake news. This paper is about our participation on CERIST Natural Language Processing Challenge. We used the dataset provided for the Task1.c. Arabic sentiment analysis and fake news detection within covid-19. The model used for this task is a simple transformer fake news model based on the Arabic pre-trained language model CAMeL-BERT. This model was utilized in two variants: a fine-tuned model and a Bidirectional long short-term model. The experiment results of this modeling CAMeL-BERT provides the best result by achieving 0.959 F1, thus outperforming all other models variants in detecting fake news.

Auteurs : Mohammed Aldawsari , Omer Salih Dawood Omer ,Yousra F.G.Elhakeem , Safa Eltayeb

Téléchargement : PDF

Arabic Sentiment Analysis within COVID-19

In this paper, we give a brief study that allow us to analyze some Arabic tweets posted in the Covid-19 period and classify them into “Positive, Negative and Neutral”. This paper is about our participation on CERIST Natural Language
Processing Challenge. We worked on a dataset that consist of 4800 tuples on which we applied three different approaches “Naive Bayes, Neuron network and Stochastic gradient descent (SGD)” where the last algorithm gave the best result with an accuracy of 91%.

Auteurs : Slimane Arbaoui, Alaa Eddine Belfedhal

Téléchargement : PDF

Exploration de l’innovation chinoise à travers l’information brevet: hégémonie ou manipulation de la connaissance?

Nous proposons dans cet article d’analyser la puissance innovatrice de la Chine. Nous nous demandons si ce pays, qui est devenu en quelques années le premier demandeur de brevets dans le monde, relève d’un réel réservoir d’invention effectif ou d’une stratégie de manipulation de la connaissance à l’échelle mondiale. En d’autres termes, est-ce que la Chine, qualifiée jadis d’usine du monde, est devenue un véritable moteur de la R&D mondiale ? L’objectif de cet article est de comprendre comment l’information brevet est exploitée par les chercheurs et de savoir quelle est la proportion des innovations à valeur ajoutée dans l’explosion du nombre de brevets chinois.

Auteurs : Nour-Eddine Aissaoui

Téléchargement : PDF

Introduction au BIG DATA : Concepts et Technologies

Depuis quelques années, le terme Big Data s’est généralisé et les plus grandes entreprises et fournisseurs de données dans le monde y sont déjà passés.
Ce phénomène qui a changé le monde, a vu le jour suite à l’explosion des données numériques et l’incapacité des systèmes traditionnels à gérer ces énormes quantités des données. En fait, Google, Yahoo et d’autres entreprises du web ont été les premiers confrontés aux problèmes de passage à l’échelle de leurs systèmes, ce qui a motivé le développement des premiers projets Big Data. Ainsi, pour répondre aux exigences des données de plus en plus massives, plusieurs projets ont été développés par la suite. Cet article est une introduction au Big Data et à ses technologies récentes.

Auteurs : Faiza Deghmani

Téléchargement : PDF

Using Genetic Algorithms to Improve Information Retrieval

Finding the valuable relevant information continues to be the major challenges of Information Retrieval Systems owing to the explosive growth of online web information.
Among these challenges, we consider the XML Information Retrieval
challenges as XML has become a de facto standard over the Web. In this paper, we tackle the issue of content-based XML information retrieval. We formulate the retrieval issue as a combinatorial optimization problem in order to generate
the best set of relevant XML elements for a given keywords query.
In our proposal, we define a genetic algorithm which maximizes similarity between a set of XML elements and the user query. The results based on the precision measure are very promising.

Auteurs : F.Z. Bessai-Mechmache, Z. Alimazighi , K. Hammouche

Téléchargement : PDF

AraCovid19-SSD: Arabic Covid-19 Sentiment And Sarcasm Detection Dataset

Coronavirus disease (COVID-19) is an infectious respiratory disease that was first discovered in late December 2019, in Wuhan, China, and then spread worldwide causing a lot of panic and death. Users of social networking sites such as Facebook and Twitter have been focused on reading, publishing, and sharing novelties, tweets, and articles regarding the
newly emerging pandemic.
A lot of these users often employ sarcasm to convey their intended meaning in a humoristic,funny, and indirect way making it hard for computer-based applications to automatically understand and identify their goal and the harm level that they can convey.
Motivated by the emerging need for annotated datasets that tackle these
kinds of problems in the context of COVID-19, this paper builds and releases AraCOVID19-SSD, a manually annotated Arabic COVID-19 sarcasm and sentiment detection dataset containing 5,162 tweets.
To confirm the practical utility of the built dataset, it has been carefully analyzed and tested using several classification models.

Auteurs : Mohamed Seghir Hadj Ameur, Hassina Aliane

Téléchargement : PDF

A set of rhetorical relationships for educational multimedia document

In this paper, we propose a set of rhetorical relations to support applications such as automatic summary generation and content adaptation of a multimedia document. These relations have been proposed in the context of an educational
environment. These rhetorical relations are integrated and handled as part of the logical dimension of the multimedia document.
The proposal of these relations is motivated by the need to take into account the particularities inherent to:
(1) the composition, editing and presentation of a multimedia document and (2) the educational context.
Indeed,multimedia documents in an educational context are very different from textual documents, for which automatic analysis and generation have led to the proposal of a set of commonly used rhetorical relations, as described in the work of Mann and Thompson.
The study of this now more common context, allowed us to go beyond the body of existing work to develop a more appropriate set of rhetorical relationships related to educational multimedia documents.
Keywords: multimedia document; educational multimedia documents; RST; rhetorical relationships;

Auteurs : Azze-Eddine Maredj , Madjid Sadallah

Téléchargement : PDF


Total	22395	72316
Today	6	6
This Week	976	1648
This Month	4013	7717

RIST

Revue d'Information Scientifique et Technique

Archives de Catégorie : Non classé

GigaBERT-based Approach for Hate Speech Detection in Arabic Twitter

XLM-T for Multilingual Sentiment Analysis in Twitter using oversampling technique

Hate speech detection model based on BERT for the Arabic dialects

Modeling Fake News Detection Using Machine Learning Algorithms for Arabic covid-19 Tweets

Arabic Sentiment Analysis within COVID-19

Exploration de l’innovation chinoise à travers l’information brevet: hégémonie ou manipulation de la connaissance?

Introduction au BIG DATA : Concepts et Technologies

Using Genetic Algorithms to Improve Information Retrieval

AraCovid19-SSD: Arabic Covid-19 Sentiment And Sarcasm Detection Dataset

A set of rhetorical relationships for educational multimedia document