torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

NLCD - Dataset not found in `paths='data'`

Open robmarkcole opened this issue 5 months ago • 4 comments

Description

With download=True a 42 byte tif is downloaded, but the error DatasetNotFoundError: Dataset not found in paths='data'and cannot be automatically downloaded, either specify a differentpaths or manually download the dataset. is raised

Steps to reproduce

from torchgeo.datasets import NLCD

dataset = NLCD(
    years=[2023], 
    download=True, 
    checksum=False
)

Version

0.7.1

robmarkcole avatar Jul 09 '25 13:07 robmarkcole

I took a look at this and it appears like the download is returning an invalid file that can't be opened with rasterio. Need to investigate further.

edit: seems like all the NLCD download links are broken and return 403 Forbidden status except for 2023 which just returns a corrupt file.

edit2: seems like the files are hosted here now and in zip files https://www.mrlc.gov/downloads/sciweb1/shared/mrlc/data-bundles/Annual_NLCD_LndCov_2024_CU_C1V1.zip so we could update the dataset to download from this url structure instead and unzip.

isaaccorley avatar Jul 11 '25 15:07 isaaccorley

Hello, I'm currently working on this bugfix. As the downloaded filetype changed to zip, I am going to use Chesapeake as a template for the processing of zipfile.

One question: is there a reason why the zip files are not deleted after extraction? Do you think it would be worth deleting the zip file after extraction by default, with a parameter to keep it if necessary?

gatienc avatar Dec 08 '25 22:12 gatienc

@adamjstewart can confirm, but I assume we want to keep the current behaviour

robmarkcole avatar Dec 09 '25 10:12 robmarkcole

We rely on torchvision.datasets.utils for most of our download/extract utilities. Many of these functions do have a remove_finished flag we could use for this, but I don't think any datasets in TorchGeo or torchvision use it. I don't really care either way, but would like to remain consistent. If people want to manually delete the zipfile themselves, we should not redownload it, just use the extracted version.

adamjstewart avatar Dec 09 '25 12:12 adamjstewart