YCB_Video_toolbox
YCB_Video_toolbox copied to clipboard
Dataset too big!
This dataset is too big to download on my system. Is there anyway you can provide a smaller version of the dataset for test purposes.
Thanks.
I was able to download it nonetheless. Thanks.
Somebody upload the data to baidu drive, you can spend 3 dollor to get a prime membership and download it in 5 hours.
You can only Register to baidu with an asian phone number
Jiaming Hu [email protected] schrieb am Mi., 26. Aug. 2020, 00:29:
Somebody upload the data to baidu drive, you can spend 3 dollor to get a prime membership and download it in 5 hours.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yuxng/YCB_Video_toolbox/issues/2#issuecomment-680301209, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUF44JDBP2T3CQNXICWQYLSCQ3OLANCNFSM4EYUZ24A .
no. you do not have to. If you do it on computer then yes. What you need to do is download baidu app from google play, and register there, then you do not need to provide chinese phone number.
If anyone is looking to download the full YCB-V dataset we have a hosted version in S3 and can give you access. Just reach out to [email protected] with subject : YCB-V download
Hi @iandewancker is the S3 bucket still active?
您的邮件我已收到,我会及时处理。祝好!
For anyone still struggling with access to the dataset.
Here's how I managed to get it to work with accessible google drive link.
- Have/get any google drive storage subscription so you won't get blocked from file access by "high file traffic" warning
- Create symlink to dataset zipfile on your drive by clicking the icon in top right corner of shared file page
- Create Google Colab instance and mount your google drive into file system
from google.colab import drive
drive.mount('/content/gdrive')
With console in front of your eyes and zipfile in file system, with less/more googling you should be fine from here.
In case you want to process the dataset in colab anyway, here's some more pitfalls to avoid.
- The dataset zip seems to contain the same data twice. Usorted files in /data_syn and organized in /data/{video_num}.
- Don't unzip the dataset into your google drive. Drive is extremely slow with accessing many small files. Either store it in uncompressed few-GB zip chunks, retrieve and unzip into colab instance as needed or do it the proper way with memory mapped files like HDF5.
- Full extract will run out of RAM. Consider unzipping in chunks.
%%capture
#get list of all files contained in zipfile
all_files = !unzip -l {zipfile}
all_files = [line[30:] for line in all_files[3:][:-2]]
CHUNK = 100
n_chunks = math.ceil(len(all_files)/CHUNK)
for i in range(n_chunks):
chunk = all_files[i*CHUNK:(i+1)*CHUNK]
chunk = ' '.join(chunk)
!unzip -n {zipfile} {chunk} -d {unzip_path} 1>/dev/null
您的邮件我已收到,我会及时处理。祝好!