update world_bank scraper
Caution: causes many changes/moves in the data CSV
Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this
Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by
wband not bycountryanymore. Isn't sorting bycountryagain an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this
Yes, but my goal here was to update the scraper with minimal changes, not to update the data (which is typically done all together in a separate process). Just pointing it out for @vincentarelbundock's awareness.
It does make it hard to see what, if anything, changed as a result.
Thanks all. I added one line of code with an arrange to make the diff easier to see. The changes look pretty minimal.
You can merge whenever you want.
@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?
@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?
Yeah, it's been a while, but if I remember correctly, my previous setup was ridiculously over-(and badly-) engineered. So I simplified everything. get_*() saves a CSV file that we keep in the repo. Then, build.R reads and merges all the CSV files.
I always run it on my local machine; never on Docker. Maybe I should do that, but things have worked ok thus far...