torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Add instructions on downloading the DeepGlobeLandCover dataset

Open robmarkcole opened this issue 1 year ago • 2 comments

Issue

The dataset docs state The dataset that we use with a custom train/test split can be downloaded from Kaggle - however this is a necessity as you cannot pass download=True

Fix

Suggest documenting the steps using kaggle CLI (below), or just to state that this must be performed? Alternatively host on Huggingface and automate the download

pip install kaggle # place api key at ~/.kaggle/kaggle.json
cd data
kaggle datasets download -d geoap96/deepglobe2018-landcover-segmentation-traindataset
unzip deepglobe2018-landcover-segmentation-traindataset.zip

robmarkcole avatar Jan 04 '24 15:01 robmarkcole

Alternatively host on Huggingface and automate the download

We haven't yet found the license for the dataset, so I'm not sure if we can do this. Want to try to reach out to the authors of the dataset to see? If the license permits redistribution, I'll add it to our Hugging Face account.

adamjstewart avatar Jan 04 '24 16:01 adamjstewart

Reading the paper, appears the imagery is from deepglobe/maxar, will check my network for contacts there

robmarkcole avatar Jan 04 '24 16:01 robmarkcole

Any updates on this?

adamjstewart avatar Feb 29 '24 11:02 adamjstewart

RE license, I came to a dead end - the people I spoke to who had worked on the dataset have all moved on

As to updating the docs, I think it could be useful if you could clarify what level of detail we should be providing?

robmarkcole avatar Feb 29 '24 11:02 robmarkcole

https://torchgeo.readthedocs.io/en/stable/api/datasets.html#torchgeo.datasets.SSL4EOS12 is an example of a similar dataset where users have to manually download it. So a similar level of detail is fine. Want to submit a PR?

adamjstewart avatar Feb 29 '24 12:02 adamjstewart