Exploring Machine Learning Methods for Author Identification on Micro-Messages
Abstract
Author Identification, also known as Authorship Attribution, is the task of identifying an author of unknown text based on the writing style captured within a dataset of writing samples. Author Identification is used in a wide variety of fields including marketing, forensic linguistics, and influence tracing. The writing samples can be found in different forms based on their audience, length, and platform. The common forms of writing samples include books, articles, emails, and messages. With the increasing use of social media, millions of micro-messages, which are short messages with a length constraint, are exchanged daily. Although micro-messages are a powerful and efficient way to communicate among individuals, their anonymity and short-length characteristics give rise to a real challenge for Author Identification. As the majority of Author Identification research is focused on finding authors of long texts, the development of social media platforms, and the emergence of social media as a primary mode of communication has increased the interest in Author Identification of micro-messages. The increase in micro-message has attracted increasing attention in many fields such as social media forensics. The task of identifying authors of micro-messages has been shown to be more difficult than Author Identification using long texts. In this work, we systematically design a set of neural network approaches to tackle this problem. Novel evolutionary algorithms and neural network architectures are developed and thoroughly tested to validate their effectiveness in unique environments. Empirically, our proposed method successfully outperforms other state-of-the-art methods in identifying the authors of micro-messages.