Fake tweet campaigns come under fire from Indiana scientists

Check out the conclusions of a recent analysis from scientists at the School of Informatics and Computing at the University of Indiana, on fake vs real tweet memes that could have serious implications for corporate social marketing campaigns in the future. Scroll down to the interesting point highlighted in bold. (PDF: Fake-tweets-identifier)

In this work we proposed a framework to deal with the problem of clustering memes in social media streams, Twitter in particular. Our system is based on a pre-clustering procedure, called protomeme detection, aimed at identifying atomic tokens of information contained in each tweet. This strategy only requires text processing, therefore is particularly efficient and well suited for a streaming scenario. Memes are thereafter obtained by aggregating protomemes on the basis of the similarity among them, computed by ad-hoc measures defined according to various dimensions including content, the social network, and information diffusion patterns. Such measures only adopt information that can be extracted in a streaming fashion from observed data, and they allow to build clusters of topically related tweets. The meme clustering is carried out by using a vari ant of Online K-means, which integrate s a memory mechanism to keep track of the least recently up dated memes. We used a dataset comprised of trending hashtags on Twitter to systematically evaluate the performance of our algorithm and we showed that our method outperforms a baseline that only uses tweet text, as well as one that assumes full knowledge of the underlying social network.

One crucial feature of our system is that it can b e extended to work with any clustering algorithm based on similarity (or distances). In this paper, for example, we chose to present Onlin e K-means b ecause of its simplicity; however, during our design we also tested other metho ds including density-based and hierarchical data stream clustering algorithms (e.g., DenStream [10] and LiarTree. Although a complete benchmark and tuning of these alternative methods was out of the scope of our analysis, we collected evidence of the ease of extension of our framework to different algorithms.

In the future one could extend the set of features incorporated by our clustering framework, considering for instance entities such as images. Furthermore, our preliminary analysis suggests that the introduction of time series as features may yield significant performance improvements. Our long-term plan is to integrate the meme clustering framework with a meme classifier to distinguish engineered types of social media communication from spontaneous ones. This platform will adopt supervised learning techniques to classify memes and determine their legitimacy, with the aim to detect misinformation and deception campaigns in their early stages. The platform will be optimized to work with the realtime, high-volume streams of messages typical of Twitter and other online social media.

100_0131

Leave a Reply