Dataset-Artikel icon indicating copy to clipboard operation
Dataset-Artikel copied to clipboard

can you zip HTML 2018 folder

Open andreaschandra opened this issue 6 years ago • 6 comments

Hi, Fery.

Could you compress the file for each year in HTML directory, because it takes a very long time to zip the folder online?

Thanks

andreaschandra avatar Jun 11 '19 10:06 andreaschandra

Sorry, just saw this. Let me see if I can do it.

feryandi avatar Jun 24 '19 05:06 feryandi

Any news about this? The zipping online just doesnt work or just stuck

cahya-wirawan avatar Sep 22 '20 19:09 cahya-wirawan

I just saw the size of the whole folder, is it really 60GB?

cahya-wirawan avatar Sep 23 '20 05:09 cahya-wirawan

HAHAHA. Just saw this issue while also trying to zip the folder.

Right now I'm copying each JSON file to my own Google Drive via Colab. Copying 110k files took 3 hours :/. I'll dump it into a CSV file next (which may or may not be a stupid idea :/). I'll update you guys when I'm done.

wildangunawan avatar Dec 09 '21 09:12 wildangunawan

@wildangunawan I put it already in huggingface dataset https://huggingface.co/datasets/id_newspapers_2018 , you don’t need to copy it from google drive, too slow :-)

cahya-wirawan avatar Dec 09 '21 09:12 cahya-wirawan

Aaaahhh love it. Thanks Pak @cahya-wirawan :D. Probably Pak @feryandi can add the link to the Huggingface dataset on readme

wildangunawan avatar Dec 09 '21 09:12 wildangunawan