
Indeed, Shah says, curbing computational complexity is the reason that machine-learning algorithms typically employ parametric models in the first place. Of course, the larger the training set, the greater the computational cost of executing Shah and Nikolov’s algorithm. “The training sets are very small,” he says, “but we still get strong results.” Shah predicts, however, that the system’s accuracy will improve as the size of the training set increases. In real time, they set their algorithm loose on live tweets, predicting trending with 95 percent accuracy and a 4 percent false-positive rate. In Shah and Nikolov’s experiments, the training set consisted of data on 200 Twitter topics that did trend and 200 that didn’t. The weighted votes are then combined, giving a probabilistic estimate of the likelihood that the new topic will trend. In effect, Shah explains, each sample “votes” on whether the new topic will trend, but some samples’ votes count more than others’. Samples whose statistics resemble those of the new topic are given more weight in predicting whether the new topic will trend or not. In particular, their algorithm compares changes over time in the number of tweets about each new topic to the changes over time of every sample in the training set. “There are a thousand things that could happen.” So instead, he says, he and Nikolov “just let the data decide.” “The problem with this is, I don’t know that things that trend have a step function,” Shah explains. Now, based on the data, you try to train for when the jump happens, and how much of a jump happens. “You’d say, ‘Series of trending things … remain small for some time and then there is a step,’” says Shah, the Jamieson Career Development Associate Professor in the Department of Electrical Engineering and Computer Science. In the standard approach to machine learning, Shah explains, researchers would posit a “model” - a general hypothesis about the shape of the pattern whose specifics need to be inferred. What distinguishes it is that it’s nonparametric, meaning that it makes no assumptions about the shape of patterns.

Like all machine-learning algorithms, Shah and Nikolov’s needs to be “trained”: it combs through data in a sample set - in this case, data about topics that previously did and did not trend - and tries to find meaningful patterns.

The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics, but it also represents a new approach to statistical analysis that could, in theory, apply to any quantity that varies over time: the duration of a bus ride, ticket sales for films, maybe even stock prices.
#Trending on twitter usa free#
A position on the list is highly coveted as a source of free publicity, but the selection of topics is automatic, based on a proprietary algorithm that factors in both the number of tweets and recent increases in that number.Īt the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student Stanislav Nikolov will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list - and sometimes as much as four or five hours before. Twitter’s home page features a regularly updated list of topics that are “trending,” meaning that tweets about them have suddenly exploded in volume.
