open-data-beta-testing icon indicating copy to clipboard operation
open-data-beta-testing copied to clipboard

Full DB Dump

Open arnaudsm opened this issue 1 year ago • 5 comments

Is there a full data dump available somewhere ?

I'm doing research and dataviz (and I suspect many here also do), which requires all the data at once. Scraping the API is cumbersome and also uses precious CPU time from this service.

Having a giant CSV or JSON or SQL file updated once a month would be awesome. Wikipedia and Stackoverflow provide a similar service, which are quite popular.

arnaudsm avatar Apr 28 '24 17:04 arnaudsm

Hi there, far from being the full data dump, but I started to pull data relative to Plenary here https://github.com/scipima/ep_vote_collect.git. The README gives you indications for getting either the data for the daily Plenary, or the full mandate. Hope this helps, Marco

scipima avatar Apr 28 '24 17:04 scipima

Datasets can be downloaded from the EP Open Data Portal : https://data.europarl.europa.eu/en/datasets

tfrancart avatar Apr 29 '24 08:04 tfrancart

Thank you for the suggestion, but the dataset portal only contains a fraction of the API data, and the 236 files have to be downloaded manually.

A full dump would be greatly appreciated.

In the meantime I'm working on a JS library to dump the API similar to @scipima work, and might open-source it at some point.

arnaudsm avatar Apr 30 '24 12:04 arnaudsm

Thank you for the suggestion, but the dataset portal only contains a fraction of the API data,

Can you be more specific on this ? What is in the API data that is not in the datasets ? I can understand that datasets are not as fresh as the API data, but other than that, I would expect the RDF content to be identical to the one from the API

and the 236 files have to be downloaded manually.

If one can scrape thousands of API calls, one could scrape 236 file downloads :-) (in reality, 236 * 28 languages). This could be an alternate way to recreate a full DB dump (but, as I said, probably not as fresh), without stressing the API.

tfrancart avatar Apr 30 '24 13:04 tfrancart

@tfrancart I was thinking of /meetings/{event-id}/vote-results. Is there a way to retrieve it on the datasets page ?

Thank you for you help, I am still new to this ecosytem. I rate-limited my dump scripts for now.

arnaudsm avatar May 03 '24 18:05 arnaudsm