countrycode icon indicating copy to clipboard operation
countrycode copied to clipboard

update world_bank scraper

Open cjyetman opened this issue 2 years ago • 6 comments

Caution: causes many changes/moves in the data CSV

cjyetman avatar Sep 09 '23 11:09 cjyetman

Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this

etiennebacher avatar Sep 09 '23 12:09 etiennebacher

Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this

Yes, but my goal here was to update the scraper with minimal changes, not to update the data (which is typically done all together in a separate process). Just pointing it out for @vincentarelbundock's awareness.

cjyetman avatar Sep 09 '23 12:09 cjyetman

It does make it hard to see what, if anything, changed as a result.

NilsEnevoldsen avatar Sep 09 '23 14:09 NilsEnevoldsen

Thanks all. I added one line of code with an arrange to make the diff easier to see. The changes look pretty minimal.

You can merge whenever you want.

vincentarelbundock avatar Sep 09 '23 14:09 vincentarelbundock

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

cjyetman avatar Sep 09 '23 15:09 cjyetman

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

Yeah, it's been a while, but if I remember correctly, my previous setup was ridiculously over-(and badly-) engineered. So I simplified everything. get_*() saves a CSV file that we keep in the repo. Then, build.R reads and merges all the CSV files.

I always run it on my local machine; never on Docker. Maybe I should do that, but things have worked ok thus far...

vincentarelbundock avatar Sep 09 '23 18:09 vincentarelbundock