A Multi-label Classification System to Distinguish among Fake, Satirical, Objective and Legitimate News in Brazilian Portuguese
Palavras-chave:Fake News, Decision Support System, Text Mining and Multi-Label
ResumoCurrently, there has been a significant increase in the diffusion of fake news worldwide, especially the political class, where the possible misinformation that can be propagated, appearing at the elections debates around the world. However, news with a recreational purpose, such as satirical news, is often confused with objective fake news. In this work, we decided to address the differences between objectivity and legitimacy of news documents, where each article is treated as belonging to two conceptual classes: objective/satirical and legitimate/fake. Therefore, we propose a DSS (Decision Support System) based on a Text Mining (TM) pipeline with a set of novel textual features using multi-label methods for classifying news articles on these two domains. For this, a set of multi-label methods was evaluated with a combination of different base classifiers and then compared with a multi-class approach. Also, a set of real-life news data was collected from several Brazilian news portals for these experiments. Results obtained reported our DSS as adequate (0.80 f1-score) when addressing the scenario of misleading news, challenging the multi-label perspective, where the multi-class methods (0.01 f1-score) overcome by the proposed method. Moreover, it was analyzed how each stylometric features group used in the experiments influences the result aiming to discover if a particular group is more relevant than others. As a result, it was noted that the complexity group of features could be more relevant than others.
Rubin, Victoria, et al. "Fake news or truth? using satirical cues to detect potentially misleading news." Proceedings of the second workshop on computational approaches to deception detection. 2016.
Ruchansky, Natali, Sungyong Seo, and Yan Liu. "Csi: A hybrid deep model for fake news detection." Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 2017.
Singhania, Sneha, Nigel Fernandez, and Shrisha Rao. "3han: A deep neural network for fake news detection." International Conference on Neural Information Processing. Springer, Cham, 2017.
Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis 18 (2010): 1-25.