This Is Auburn

Advancing Online Hate Speech Detection Using External Features and Large Language Models

Date

2024-07-24

Author

Das, Amit

Abstract

Social media is a concept developed to link people and make the globe smaller. But it has recently developed into a center for hateful posts that target different people and communities. As a result, there are more events of hostile actions and harassing remarks present online. Since this issue can cause immense harm on a person, it needs to be addressed with immense priority. There are many Natural Language Processing models that have been implemented for hate speech detection. In our study, we begin by using BERT combined with TFIDF representation to tackle the challenges of identifying irony and stereotype-spreading authors on Twitter. For the classification task, we employed a logistic regression classifier. Our findings indicated that the combination of BERT representation with TFIDF yielded very promising results. To delve deeper into the issue, we addressed sexism, another form of hate speech that predominantly targets women. For sexism detection, we introduced a fine-tuned RoBERTa model. This involved encoding the initial data representation using RoBERTa and implementing three distinct Multilayer Perceptrons (MLPs) for the three sub-tasks. The experimental results showcased the effectiveness of our proposed strategy. Additionally, we explored the potential benefits of incorporating external features in the detection of sexism and hate speech. Specifically, we examined the impact of user gender information on online sexism detection in both binary and multi-class classification contexts. Given that most sexist comments are directed towards individuals of a particular gender, understanding the role of user gender information is crucial. Our experiments demonstrated that integrating user gender information with textual features enhanced classification performance in both binary and multi-class classifications. Further advancing our research, we introduced OffensiveLang, a novel community-based implicit offensive language dataset generated by ChatGPT 3.5, covering 38 different target groups. Despite ethical constraints limiting the generation of offensive texts via ChatGPT, we devised a prompt-based approach to effectively generate implicit offensive language. To ensure data quality, we evaluated our dataset through human assessments. Moreover, we employed a prompt-based Zero-Shot method with ChatGPT and compared detection results between human annotations and ChatGPT annotations. We also utilized existing state-of-the-art models to evaluate their effectiveness in detecting such languages and investigated annotator biases in hate speech data annotation using large language models. Lastly, we investigated gender, race, religion, and disability biases in LLMs used for hate speech detection and proposed mitigation strategies. We demonstrated the presence of these biases in LLMs such as GPT-4o and GPT-3.5 when annotating hate speech data. We then explored the underlying factors that contribute to these biases, providing a thorough analysis of the annotated data and emphasizing the role of subjective interpretations. Finally, we suggested potential solutions to mitigate these biases, highlighting the importance of tailored prompts and fine-tuning LLMs to enhance the fairness and accuracy of annotations.