It is time for the 2019 version of the prediction of Eurovision results from Twitter data. If you are interested in the method used to make this predictions, it is described in the quite detailed fashion in the post describing the results from the first analysis conducted 2015 which you can find here (2016 results can be found here , 2017 analysis is here, and 2018 analysis can be found here).
The bare code (not much documentation) is available here.
So far the track record of these predictions is as follows:
2015 – correct prediction of the winner (Sweden)
2016 – predicted winner (Russia) was second
2017 – correct prediction of the winner (Portugal)
2018 – predicted winner (Cyprus) was second (and Israel, the correct winner was so, so close in the prediction)
This year the analysis has been trickier than previous years because:
1. number of tweets about Eurovision is quite a bit smaller than in the previous years, making all of the results from the analysis more uncertain (you can see that in the number of tweets as a function of time shown on the plots below). I am not sure if there are actually less tweets or there has been change to the Twitter API?!
2. number of hashtags which were shown on TV screen during the performances was changed from more natural hashtags to 3 letter ISO code (for example, Switzerland was #CHE this year, and was #SUI all of the previous years). I see that for these countries the twitter mentions were often split between these different hashtags. In theory one could join all of these lists, but at least for this year I only kept the mentions of the country with the official code – if we took into account and added the “confused” viewers that used the old hashtags, this could certainly change the results. For instance, taking into account #NED for Netherlands would surely bring even more vote to Netherlands, which is a strong favorite on the betting sites.
In summary, as you can see, my money is on Australia. On the betting sites Australia is 2nd favorite, behind Netherlands (for which the number of points might be underestimated). O well, I guess we will have to wait and see!
Below I am also posting the auxiliary plots that I show every year. First, I am showing the time dependence of tweets during the semi-finals. Notice how you can precisely see when which country is performing. You can even when the breaks in the program are. If you are comparing this plots with the old data, note that the normalization of the plots is lower than the previous years.
Finally, I show below the “predicted” results for the semifinal, based on the same twitter analysis. This can gives a hint about the quality of the analysis. In both the first semi-final and the second semi-final the algorithm correctly predicts 7 out of 10 countries that passed to finals. This is somewhat worse than in the previous years, which might not be surprising given the problems with this dataset, that I mentioned above.