Predicting Eurovision 2016 from Twitter data…

…and the winner is…. Russia!!! (maybe)

FinalSitePlus200

click to enlarge


For explanation how the figure was created see text below…

This is 2016 version of the Eurovision prediction. I have explained systematics in quite detailed fashion in the last year post which you can find here. Very shortly, I measured how many tweets have been sent about each song from each country. From this, I estimated amount of votes that each country would give to another. For example, if Germans tweets the most about Polish song, I assume that Germany will give Poland 12 points. Notice that this is very different then simply collecting all the tweets and measuring which song was most tweeted about – this would be heavily biased toward to countries that use Twitter the most; these measurements are normalized per each country. Even though this is very crude estimate and possible caveats are numerous last year’s winner was correctly predicted and overall the prediction was quite good (see here for comparing prediction to actual results). If you are confused by the number of points, the amount of points is much larger than in previous year because of the change to the voting rules (http://www.eurovision.tv/page/voting).

Below I am showing some other interesting plots. Firstly we can first see time dependence of tweets during the semi-finals. Notice how you can precisely see when which country is performing. You can even when the breaks in the program are, and also beginning of the voting (around 1.6 hours after the start of the programme) and announcement of results (bump at 2 hours).

SF1Plus100

SF2Plus100

Even though the semi-finals have finished, it is only known which countries advance to the finals, but not their score in semi-finals. Below I show what is the prediction for the number of points in the semi-finals (you can compare it after the Eurovision is finished and these results are made public). Colors are the same for the same countries as above. Because there are so many countries, unfortunately some color have to repeat, but the country is always clearly stated below. In the first semi-final the algorithm correctly predicts 9 out of 10 countries that passed to finals (although it fails spectacularly for Estonia, but shh…. Just a reminder that this is just an estimate). For second semi-final we seem to be doing better, again predicting 9 out of 10 countries, but without catastrophic failures. This already gives us confidence in our results.

PSF1Plus200

PSF2Plus200

Finally here I show prediction which was derived on a bit different dataset. In the plot shown on the top, I combined both tweets that use hastag of a country (e.g. #POL) and tweets that mention the name of the country in English (e.g. Poland) with the hastag #EUROVISION. This could potentially bias against certain countries (e.g. Russia which gets a lot of support of nearby, predominately Russian speaking countries). On the other hand using only hastag of a country reduces our dataset by roughly 50%. For comparison, plot derived using only data collected via hastags of countries is shown below. As you can see the result differs in details, but the basic trends are the same. Results from this plot seem to strength the first position of Russia.

FinalSite200

Leave a Reply

Your email address will not be published. Required fields are marked *