New and more timely business statistics based on web scraping
In this project, business statistics are developed based on natural language processing applied to the texts scraped from the websites of Belgian companies.
This project is still ongoing. Once this project has been completed, you will find a full description on this page.
Objective
In this project, business statistics are developed based on natural language processing applied to the texts scraped from the websites of Belgian companies.
Business statistics can be produced in this way on a very frequent basis, based on the entire set of companies with a known website.
Data
This project uses a dataset of all companies with a legal entity in Belgium for which a URL is known.
Methods
This project uses web scraping to download visible texts from company websites.
This project uses Natural Language Processing and Machine Learning to automatically categorize the visible text.
Results
This project is still ongoing, and there are no shareable results available yet. Once available they will appear on this page.