How to fastly extract the dataset
I downloaded the .tar.gz file in https://huggingface.co/datasets/TIGER-Lab/M-BEIR, but it's really large and the pv command shows that I need 2.5 days to extract the file!
Can you provide smaller zip files that package each dataset into a zip file? Thanks very much!
After downloading the .tar.gz files, use the following command to combine the files into a single file: sh -c 'cat mbeir_images.tar.gz.part-00 mbeir_images.tar.gz.part-01 mbeir_images.tar.gz.part-02 mbeir_images.tar.gz.part-03 > mbeir_images.tar.gz'
Next extract images from the combined file: tar -xzf mbeir_images.tar.gz
It will not take 2.5 days. I was able to complete the whole process in just 10 hrs
After downloading the .tar.gz files, use the following command to combine the files into a single file: sh -c 'cat mbeir_images.tar.gz.part-00 mbeir_images.tar.gz.part-01 mbeir_images.tar.gz.part-02 mbeir_images.tar.gz.part-03 > mbeir_images.tar.gz'
Next extract images from the combined file: tar -xzf mbeir_images.tar.gz
It will not take 2.5 days. I was able to complete the whole process in just 10 hrs
Thanks. But I'm extracting it with a 2-core CPU, so it takes a long time. It'll be better if you split it into many smaller zip files.