Full DB Dump
Is there a full data dump available somewhere ?
I'm doing research and dataviz (and I suspect many here also do), which requires all the data at once. Scraping the API is cumbersome and also uses precious CPU time from this service.
Having a giant CSV or JSON or SQL file updated once a month would be awesome. Wikipedia and Stackoverflow provide a similar service, which are quite popular.
Hi there, far from being the full data dump, but I started to pull data relative to Plenary here https://github.com/scipima/ep_vote_collect.git. The README gives you indications for getting either the data for the daily Plenary, or the full mandate. Hope this helps, Marco
Datasets can be downloaded from the EP Open Data Portal : https://data.europarl.europa.eu/en/datasets
Thank you for the suggestion, but the dataset portal only contains a fraction of the API data, and the 236 files have to be downloaded manually.
A full dump would be greatly appreciated.
In the meantime I'm working on a JS library to dump the API similar to @scipima work, and might open-source it at some point.
Thank you for the suggestion, but the dataset portal only contains a fraction of the API data,
Can you be more specific on this ? What is in the API data that is not in the datasets ? I can understand that datasets are not as fresh as the API data, but other than that, I would expect the RDF content to be identical to the one from the API
and the 236 files have to be downloaded manually.
If one can scrape thousands of API calls, one could scrape 236 file downloads :-) (in reality, 236 * 28 languages). This could be an alternate way to recreate a full DB dump (but, as I said, probably not as fresh), without stressing the API.
@tfrancart I was thinking of /meetings/{event-id}/vote-results. Is there a way to retrieve it on the datasets page ?
Thank you for you help, I am still new to this ecosytem. I rate-limited my dump scripts for now.