can you zip HTML 2018 folder
Hi, Fery.
Could you compress the file for each year in HTML directory, because it takes a very long time to zip the folder online?
Thanks
Sorry, just saw this. Let me see if I can do it.
Any news about this? The zipping online just doesnt work or just stuck
I just saw the size of the whole folder, is it really 60GB?
HAHAHA. Just saw this issue while also trying to zip the folder.
Right now I'm copying each JSON file to my own Google Drive via Colab. Copying 110k files took 3 hours :/. I'll dump it into a CSV file next (which may or may not be a stupid idea :/). I'll update you guys when I'm done.
@wildangunawan I put it already in huggingface dataset https://huggingface.co/datasets/id_newspapers_2018 , you don’t need to copy it from google drive, too slow :-)
Aaaahhh love it. Thanks Pak @cahya-wirawan :D. Probably Pak @feryandi can add the link to the Huggingface dataset on readme