Flemish Twitter sentiment
This project is developing a new indicator on the sentiment of Flemish people on Twitter.
This project is currently in progress. The information below should therefore not be regarded as final. The results of this project will be added to this page upon completion.
Objective
The aim of this project is to develop a new experimental statistic based on the sentiment of Flemish Tweets.
Sentiment of a Tweet means “the opinion of the author of the Tweet about the content of the Tweet”. For this statistic, sentiment is categorized into 3 classes:
- positive: the Tweet author expresses himself positively about the subject of the Tweet;
- negative: the Tweet author expresses himself negatively about the subject of the Tweet;
- neutral: the author does not express himself clearly positively or negatively about the subject of the Tweet.
The development of this statistic will allow for a quasi-realtime insight into the sentiment of Flemish Twitter users.
This statistic makes it possible to immediately monitor the impact of events on the sentiment of the active Twitter population.
Data
Every hour of the day, 500 random Dutch Tweets coming from Belgian accounts are requested via the Twitter development API.
This project only aims to develop an aggregated sentiment indicator based on random samples of Tweets. The selection of the sample is done by Twitter via the Twitter Development API. There is no monitoring of individual Twitter users.
Underlying research
The use of Twitter data to produce public statistics is new and raises a number of methodological and social questions. To answer these questions, some studies have already been carried out, and further research will be carried out in the future. Studies have already been done on the following topics:
- Analyzing selectivity in the population of Twitter users.
- Comparing Twitter indicators for the specific policy area of ‘lifelong learning’ with existing indicators.
- Comparing different categories of sentiment models for Flemish Tweets.
- Mapping differences between socio-demographic groups in manual sentiment labelling of Tweets in order to create a correct training set for machine learning models.
The results of this research are used directly by the Data Science Hub to ensure that the resulting statistic “Flemish Twitter sentiment” is of the highest possible quality.
Processing of personal data
In carrying out this project, Tweets are processed from Twitter accounts with a location in Belgium. Because Tweets fall under the definition of personal data1, the Flemish Statistical Authority (VSA) must comply with legislation and guidelines regarding the processing of personal data, such as the General Data Protection Regulation (GDPR).
The task of the VSA is of general interest (following the Administrative Decree of 7 December 2018, more specifically Section 8. Organization of the statistics policy, Article III.107. to 113 inclusive) to coordinate the development, production and dissemination of the Flemish public statistics and their quality assurance. The further processing for statistical purposes (of the VSA) is considered compatible by the GDPR with the original purposes (of Twitter). The VSA has made an agreement with Twitter in which, among other things, the statistical purposes of the VSA are laid down. For the realization of those purposes, the VSA only processes the personal data necessary for this (see below), which is not kept longer than necessary (see below).
The Tweets are processed in the following way:
- every hour ± 500 random Dutch Tweets coming from Belgian Twitter accounts are obtained via the Twitter Development API. Via that application programming interface (API), the Twitter username, the Tweet ID, the text of the Tweet, the date, the number of likes, and the number of retweets are obtained for each Tweet. The number of likes and retweets are provided by the Twitter API by default, but are not further processed;
- the obtained Tweets are assigned a sentiment score (namely positive, neutral or negative) through a machine-learning model developed and managed by the VSA;
- the sentiment scores are aggregated by hour, day, week, month and year. The most common words in the Tweets are also tracked. These most common words are shown in the sentiment statistic as extra information on what is being tweeted about at any given time. No analyzes are made on individual Twitter accounts under any circumstances. The Twitter usernames are used solely to ensure that each Twitter user has an equal impact on the statistics produced, regardless of the number of Tweets that Twitter user has written;
- throughout the development phase of this project, the Tweet data will be retained by the VSA. This is necessary for carrying out checks on the quality of the statistics production. The data is stored in an environment that is only accessible to employees of the VSA;
- once the Twitter sentiment has been published, the VSA retains the Tweet data for a maximum of 12 months for quality monitoring of the statistics production. The aggregated sentiment scores and word frequency data will be retained for as long as the statistic is provided by the VSA.
A data subject, whose personal data is processed by the VSA, has the right to request access to and rectification or erasure of personal data or to request the restriction of the processing concerning him, as well as the right to object to the processing. Such requests can be submitted to the data protection officer (Data Protection Officer/DPO) of the VSA via dpo.sv@vlaanderen.be(opens in your email application).
A data subject always has the right to submit a complaint:
- about a Flemish body (such as the VSA) with the Flemish Supervisory Commission for the processing of personal data (VTC), which is only competent for bodies of the Flemish government. After completing the complaint form (in Dutch), you can send it by e-mail (contact@toezichtcommissie.be(opens in your email application)) or by post (for the attention of the Flemish Supervisory Commission, King Albert II- avenue 15, 1210 Brussels);
- at the Data Protection Authority (GBA), which is the federal supervisory authority in the field of personal data protection. After completing the complaint form (in Dutch)(opens in new window), you can send it via the website or print it and send it by post (for the attention of the Data Protection Authority, Drukpersstraat 35, 1000 Brussels),
and to take legal action against the processing of the data by the VSA.
1Art. 4, 1) GDPR “personal data” means any information relating to an identified or identifiable natural person (“the data subject”); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, psychological, economic, cultural or social identity of those