Πλοήγηση ανά Συγγραφέα "Avramidou, Natalia"

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω

Τώρα δείχνει 1 - 1 από 1

Examining how teacher-student approaches can benefit few-shot learning for toxicity detection tasks
(2022-12-30) Αβραμίδου, Ναταλία; Avramidou, Natalia; Athens University of Economics and Business, Department of Informatics; Pavlopoulos, Ioannis; Androutsopoulos, Ion
Τhe evolution of social media platforms has introduced the need for systems that detect the toxic behavior of users. A Toxicity Detection system tries to detect user posts that are offensive and abusive. The field of Natural Language Processing (NLP) contributes to detecting this hateful content by automating it with classification models that categorize user posts as offensive or not offensive. As annotating thousands of examples for training (NLP) models is expensive, it is a significant challenge to train a model effectively with the least amount of labeled data. Although there is a lack of fully annotated datasets for many different tasks, usually, there is a much larger pool of task-specific unlabeled instances that could be used to improve a system’s performance. In this thesis, we will focus on: toxicity detection in Greek tweets and sexism detection in English posts. There are many methods explored in literature for few-shot learning scenarios. Self-training is a semi-supervised method where a Teacher model is initially trained on the few available labeled instances. Subsequently, it generates silver labels for the bigger pool of task-specific unlabeled data. In each round, it samples a number of silver-labeled examples, in most cases, based on the model’s confidence. These examples and their silver labels act as additional supervision to train a stronger Student model iteratively. Active Learning tries to maximize the system’s performance gain by identifying the most informative examples to be labeled by a human annotator or, in our case, to be selected among those silver-labeled by the Teacher. In this thesis, we focus on applying the Teacher-Student approach to detect toxic and sexist content when the initial training examples are limited. We also employ Active Learning criteria in the Self-training algorithm to examine if they could further benefit our system.Τhe evolution of social media platforms has introduced the need for systems that detect the toxic behavior of users. A Toxicity Detection system tries to detect user posts that are offensive and abusive. The field of Natural Language Processing (NLP) contributes to detecting this hateful content by automating it with classification models that categorize user posts as offensive or not offensive. As annotating thousands of examples for training (NLP) models is expensive, it is a significant challenge to train a model effectively with the least amount of labeled data. Although there is a lack of fully annotated datasets for many different tasks, usually, there is a much larger pool of task-specific unlabeled instances that could be used to improve a system’s performance. In this thesis, we will focus on: toxicity detection in Greek tweets and sexism detection in English posts. There are many methods explored in literature for few-shot learning scenarios. Self-training is a semi-supervised method where a Teacher model is initially trained on the few available labeled instances. Subsequently, it generates silver labels for the bigger pool of task-specific unlabeled data. In each round, it samples a number of silver-labeled examples, in most cases, based on the model’s confidence. These examples and their silver labels act as additional supervision to train a stronger Student model iteratively. Active Learning tries to maximize the system’s performance gain by identifying the most informative examples to be labeled by a human annotator or, in our case, to be selected among those silver-labeled by the Teacher. In this thesis, we focus on applying the Teacher-Student approach to detect toxic and sexist content when the initial training examples are limited. We also employ Active Learning criteria in the Self-training algorithm to examine if they could further benefit our system.