covid-19 icon indicating copy to clipboard operation
covid-19 copied to clipboard

Data has gone stale

Open sigmondzavion opened this issue 2 years ago • 10 comments

The last update appears to be 4/16.

Q: Do you have an ETA for making the data current?

sigmondzavion avatar Apr 24 '22 13:04 sigmondzavion

Hi, any answer to this?

aminoplis avatar May 20 '22 18:05 aminoplis

Still stale. :-(

alan-isaac avatar Jul 16 '22 18:07 alan-isaac

Hello @anuveyatsu

I hope you are good.

I would love to take up this task to update the data on this repo.

I am currently working on it at the moment.

seun-beta avatar Oct 11 '22 10:10 seun-beta

Hello @anuveyatsu

I was able to discover an issue. The GitHub Actions workflow fails because of the large size of the CSV files which is over 100MB (the max file size for GitHub).

I am of the idea that the the result should be written to CSV, compressed and then zipped so as to reduce the size OR the Paraquet should be used as a file format.

Please let me know what you think about it.

seun-beta avatar Oct 12 '22 16:10 seun-beta

Thank you @seun-beta for spending time to investigate this issue 👍🏼

I think the best option would be to use git lfs (https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage) so that we can keep having the data in the consistent format. I'm not sure you'd be able to complete it because I think we need to wire up an external blob storage here (e.g., S3, Google Cloud Storage etc.).

anuveyatsu avatar Oct 22 '22 07:10 anuveyatsu

Hello @anuveyatsu

Thank you for your response. I also researched Git LFS initially but the overall setup was a little too much.

An idea about using S3 and Boto3 just popped into my mind. When the workflow run is triggered based on the cron configuration, the code could push results into S3 directly.

What do you think about that?

seun-beta avatar Oct 22 '22 08:10 seun-beta

Hello,

I've tried deploying Git LFS, and getting this error.

> [main eb55196] Auto-update of the data packages
>  8 files changed, 24 insertions(+), 8580484 deletions(-)
> batch response: @github-actions[bot] can not upload new objects to public fork mforsetti/covid-19
> error: failed to push some refs to 'https://github.com/mforsetti/covid-19'
> Error: Process completed with exit code 1.

Apparently Git LFS refuses to push against forks of non-Git LFS parent repo. See git-lfs/git-lfs#1906.

What about gzip-ing the generated CSVs? We can add gunzip-ing code into scripts/update_datapackage.py script.

mforsetti avatar Nov 13 '22 11:11 mforsetti