torchgeo
torchgeo copied to clipboard
NLCD2016 Tree Canopy
This PR adds the NLCD2016 Tree Canopy dataset
See https://www.mrlc.gov/data/nlcd-2016-usfs-tree-canopy-cover-conus
Comparison of the IMG format to COG format:
- IMG is uncompressed, just the tree canopy dataset is 20GB on disk. It is 16832104560 pixels that are 1 byte each + overviews :)
- IMG format consists of a .html, .ige, .img, and .img.xml file
- COG is 4.2 GB total, no extra files, lossless compression
- Random windowed reads on the IMG data is ~.6 seconds per 1000
- Random windowed reads on the COG data is 2.1 seconds per 1000

Surprised img is faster than COGs, I thought COGs were the gold standard.
This will need to be rebased once #1244 is merged.
Surprised img is faster than COGs, I thought COGs were the gold standard.
I'm guessing the difference is compression related (COG is 5x smaller and 3x slower to read). It is apples to oranges as if these were hosted on a remote server, you could still do windowed reading quickly with a COG.
Confirmed that the difference is entirely compression related:

(for completeness, because I was curious)
It is apples to oranges as if these were hosted on a remote server, you could still do windowed reading quickly with a COG.
Not quite actually, you can still do windowed reading from remote files with the Erdas Imagine format, but it is 2x slower than COGs. Also, compression vs. no compression doesn't seem to matter when reading from remote files (it looks like compressed is slightly faster, which makes sense as the time it takes to transfer the data is going to dominate).

TL;DR -- use COGs