torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Change of Link to Substation Dataset

Open rijuld opened this issue 7 months ago • 2 comments

Hi, @adamjstewart I want to change the link to the substation dataset to Hugging Face for the next release. But I am getting the following error while doing so. This is because the dataset is big and I have to divide it into parts, can I implement my custom function to handle this and a external library, or is there an alternative:

File /ext3/miniforge3/lib/python3.12/zipfile/__init__.py:257, in _EndRecData64(fpin, offset, endrec)
    254     return endrec
    256 if diskno != 0 or disks > 1:
--> 257     raise BadZipFile("zipfiles that span multiple disks are not supported")
    259 # Assume no 'zip64 extensible data'
    260 fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)

BadZipFile: zipfiles that span multiple disks are not supported
File /ext3/miniforge3/lib/python3.12/zipfile/__init__.py:257, in _EndRecData64(fpin, offset, endrec)
    254     return endrec
    256 if diskno != 0 or disks > 1:
--> 257     raise BadZipFile("zipfiles that span multiple disks are not supported")
    259 # Assume no 'zip64 extensible data'
    260 fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)

BadZipFile: zipfiles that span multiple disks are not supported

rijuld avatar Apr 17 '25 14:04 rijuld

What's the total size of the .tar.gz? HF can host individual files up to 50 GB each. For larger files, you can split it up, but you'll need to merge it again before extraction. The SSL4EO-L dataset is a good example of this.

adamjstewart avatar Apr 17 '25 17:04 adamjstewart

It's around 70 GB. Okay, I will use SSL4EO-L as an example

rijuld avatar Apr 17 '25 17:04 rijuld