The Football World Cup in Russia offers great opportunities to be creative with data analysis. Nowadays it is unthinkable to disregard data in modern football. Top clubs already employ young laptop trainers who try to mantle the ultimate football machine using statistical analysis en lot’s of data as their weapon of choice.
All dry statistics aside: football still thrives largely on emotions. Especially during the WC football whole nations hold their breaths when their national pride battles for eternal glory. I also love to ride this emotional rollercoaster when our Clockwork Orange faces a tough opponent.
This presented a perfect opportunity to measure all these rollercoaster emotions. Sadly, the Dutch national football teams shines in absence. Therefore I’ve chosen to analyse the semi finals face off between Croatia and England. All is done from English perspective; as a Dutchman, my proficiency of the Croatian language is virtually non-existent.
Twitter is a great way to find out what large groups of people feel and think about a certain topic. Every tweet contains positive and negative words. Do you count more positive words than negative ones? Then the tweet has a positive sentiment overall. The opposite is also very true.
This simpel method however has a drawback: sentimental values tend to shift by reading it within the context of other words. For example, irony, sarcasm or sayings can switch the polarity of the sentiment.
This drawback becomes relatively smaller by gathering lot’s of tweets. For this game a total of 268.000 tweets where downloaded. This amount could be enough to plot a reasonably reliable sentiment-curve.
It turned out to be an epic match. The English rapidly scored a fifth minute leading goal, but the Croatians marked an equalizer in the 68th minute and pushed the game into extra time. There they dipped the English in mourning by scoring again: 2-1 final score, shattering England’s dream of new WC success since 1966.
The figure below is crystal clear. You can easily see the flow of events from English perspective. I summarized my the most striking observations
Before the match, English fans felt it was already decided: #Footballiscominghome dominated t Twitter for days. This positive sentiment is clearly seen in this time period.
“Got my red England shirt on not been washed smells of beer but who cares it’s coming home #ENGCRO”
The national anthems create a small spike of positive emotions. Many tweets reflect this sentiment:
“I half expected the English fans to start singing “FOOTBALLS COMING HOME” during the national anthem and I’m lowkey…#ENGCRO “
De beautiful free kick of Kieran Trippier causes extacy among English twitter fans. The big spike upwards around 20:08 hours is clearly seen.
“Yes TRIPPIER! You beauty! #BuryLad #ENGCRO #ThreeLions #ITSCOMINGHOME
Ivan Perisic heads in the 1-1 equalizer in the 65ste minute and this causes the most negative sentiment in the entire match. English tweets are not happy at all:
“Shit! Croatia equalise. That was coming, to be fair… #ENGCRO #WorldCup2018 England 1 – Croatia 1.”
“That’s bad. That’s very bad. #ENGCRO”
In extra time Mario Madzukic scores the liberating 2-1 for the Croatian side. It’s remarkable that the disappointment amongst Twittering England is much less negative than the 1-1
“So, it’s not coming home then?! U0001f62e #ENGCRO”
“WOW! England finally breaks! 2-1 Croatia! #WorldCup #ENGCRO”
Shortly after the final whistle, tweets cumulate to a very positive sentiment. A glance at these tweets shows that the whole nation is proud at what the Three Lions have achieved. Many fans act as real sportsmans and congratulate Croatia with their victory.
“As much as I wanted an English victory, has to be said the best team won…congratulations #Croatia good luck on Sunday. #WorldCup #ENGCRO”
“Proud of the #eng team, we outlasted some of the best international teams and have been a joy to watch.… #ENGCRO”
The data science techniques behind this analysis is suitable with many forms of unstructured tekst. Think about public information like fora, customer reviews or page content. Also private informatie like e-mail and correspondence is suited for the job. Do you want to try this technique for your business? Please feel free to contact me
Behind this analysis is a technical step-by-step-plan and programming code in R. Please send me a message if you would like to learn more about this. I’d love to discuss this with you.
Roeland van der Molen is managing consultant at Leissner & Van der Molen. As a legal and data professional he has developed himself the last ten years into an allround number cruncher. He helpt clients to get more grip on their data, facts and figures and build an effective strategy from this. ”Innovation starts from your base”.