gdal icon indicating copy to clipboard operation
gdal copied to clipboard

gdal_footprint enhancement

Open hobu opened this issue 2 years ago • 1 comments

A convenient but not extravagant application-level footprint method would be quite useful. Software such as stactools has such a thing in it, and there are probably many more partial implementations throughout the GDAL-using ecosystem. This ticket explores implementation of the functionality in a command line utility.

The proposed algorithm should be straightforward and simple. Multi-resolution or simplification capability of the utility is desired, with support for completely excluding nodata, thresholding positive space, polygon simplification, and multiple OGR-writeable output is desired.

References

https://github.com/stac-utils/stactools/blob/main/src/stactools/core/utils/raster_footprint.py https://gis.stackexchange.com/a/408628/350

https://gis.stackexchange.com/questions/61512/calculating-image-boundary-footprint-of-satellite-images-using-open-source-too

Default Options Example

Compute foot print with sensible defaults

  • All bands
  • Full resolution
  • NO nodata pixels in bands or nodata mask
  • Default polygon simplification
gdal_footprint input.tif output.json

Compute footprint of Band 1 only at first resolution index

In this case, we only care about Band 1, but we want to do it on the first (presumably 1/2) overview level. This will result in a coarser polygon, but hopefully a much faster compute

gdal_footprint input.tif output.json -b 1 -oi 2

Compute footprint of all bands at overview resolution 0.35

Same as above, but support stating resolution instead of 0.35

gdal_footprint input.tif output.shp -b 1 -or 0.35

Please add ideas, experiences, and techiques to the ticket. It is also valid that this functionality does not belong as a GDAL command line utility, but we were exploring such a thing at the FOSS4G2022 Code Sprint.

hobu avatar Aug 27 '22 13:08 hobu

I like this tool https://github.com/gina-alaska/dans-gdal-scripts/blob/master/src/gdal_trace_outline.cc a lot. It has some advanced features which are very useful for us (National Land Survey of Finland) when we generate footprints for historical orthophotos (erosion, min-ring-area, dp-toler, selection of out-cs including a possibility to use pixel coordinates).

What we miss in gdal_trace_outline is a possibility to write the rasterized result (-mask-out) directly into a new alpha or mask band that would be added to the original image. Now we need to do that as a separate step by rasterizing the polygon footprint back. And why we do want to convert nodata into alpha/mask is because nodata value does not really work if images are compressed with some lossy method. Black and white orthophotos also tend to have pure black and white pixels in real data (black water bodies, sun reflecting from tin roofs etc.). Gdal_trace_outline fixes those problems well with erosion and min-ring-area.

License of gdal_trace_outline is compatible but boost is probably too big as a new dependency. However, I recommend to use that utility as a reference.

Partial GDAL implementations were mentioned and I have tried gdal_contour with fixed level set to nodata value and with polygon output. That is not the way to go I think. Nearblack with setalpha/setmask does good work with saving what can be saved from nodata that is ruined with lossy compression and also with uncompressed images with good nodata. Nearblack had problems with some concave nodata areas in the middle of the image even it tries to handle them with double scanning "The algorithm also scans from top to bottom and from bottom to top to identify indentations in the top or bottom."

Here is an example of those bit more difficult images image

jratike80 avatar Aug 28 '22 00:08 jratike80

I think it would be great to have a single tool in GDAL to generate footprint polygons from a raster. I'm a fan of dans-gdal-tools but I went through a period of time where I couldn't compile it on systems I was using so I came up with a 2-step process to generate footprints of rasters using tools that ship with GDAL.

  1. Say you have an 8-bit image with a NoData value of 0, use gdal_translate to stretch values [1,255] to [1,1], and assign 0 to represent the NoData value on output. (Optionally, save result as VRT rather than writing a new raster to disk): gdal_translate -scale 1 255 1 1 -ot Byte -of vrt -a_nodata 0 input_ortho.tif input_ortho_mask.vrt

  2. Create a polygon shapefile that outlines the valid (DN==1) regions of the VRT just created gdal_polygonize.py -8 input_ortho_mask.vrt -f "ESRI Shapefile" mask_footprint.shp mask_footprint DN

@jratike80 I think Step 1 above partially addresses your desire for a raster mask. Instead of outputting to VRT, the mask can be written to some other file type and fed into another GDAL command for use as an alpha or mask band.

dpmayerUSGS avatar May 25 '23 21:05 dpmayerUSGS

@jratike80 Regarding "Nearblack had problems with some concave nodata areas in the middle of the image", on your example nearblack 187082488-6b907631-9f82-48c5-b696-9b886512225a.png -o out.tif -color 0,0,0 -color 255,255,255 -setalpha -nb 0 gives a correct result. The issue with that example is that there's a 1 pixel black border around the white padding, so you need to specify both colors. That said for "spiral-like" concave patterns, the 2-pass algorithm may be defeated. I'm working on implementing an alternate algorithm in nearblack based on the Flood fill algorithm (https://en.wikipedia.org/wiki/Flood_fill) to fix the most difficult cases

rouault avatar May 31 '23 09:05 rouault

No, the original tiff does not have a black border. The problematic place for nearblack is here (the location is at about pixel coordinates 60%, 60% from the top left corner. This is the output from nearblack. Notice the white stripe.

image

It looks to me that a horizontal scan fills nodata here, but the vertical scan does not happen because the data block at the top-right corner stops the scanning.

jratike80 avatar May 31 '23 10:05 jratike80

Nearblack floodfill algorithm implemented in https://github.com/OSGeo/gdal/pull/7877

rouault avatar May 31 '23 16:05 rouault

@hobu I'm not totally clear on what you've in mind if all bands are to be used, rather than selecting a single band. In the situation when there is no global per-dataset mask band (a situation where we have a global mask band is for example a RGBA dataset), do you expect a footprint per band to be generated (using the nodata value or mask band of each band), or do you expect some combination of the mask bands into a single one to generate a global footprint, in which case I can imagine that depending on the situation, we can want to either use the union or intersection of all mask bands. But that becomes a bit complicated, so I want to be sure if that meets an actual use case

I would say that (when no particular band is selected):

  • if there is a per-data mask band, compute a single footprint
  • if there is none, compute a footprint per band

rouault avatar Jun 16 '23 11:06 rouault

When the images are "natural" images like RGB or RGB-NIR I do not think that a footprint per band is a good default. Every band tend to have extreme values.

jratike80 avatar Jun 16 '23 12:06 jratike80

When the images are "natural" images like RGB or RGB-NIR I do not think that a footprint per band is a good default. Every band tend to have extreme values.

In that case, people will have to run first nearblack to set a mask or apha band (assuming the data type is Byte). Or did you have something else in mind? My above question was for the generic case, where it might be multispectral or hyperspectral data, or perhaps time-series represented as bands, or a unknown case, where we can't assume any correlation between bands

rouault avatar Jun 16 '23 13:06 rouault

hum, for multi-band, https://github.com/stac-utils/stactools/blob/main/src/stactools/core/utils/raster_footprint.py seems to take the union of the validity masks: https://github.com/stac-utils/stactools/blob/main/src/stactools/core/utils/raster_footprint.py#LL263C26-L263C26

(one of the challenges compared to existing approaches, is that given the GDAL philosophy of being able to address arbitrarily large datasets, we need to avoid full ingestion in RAM of the rasters, which complicates implementation)

rouault avatar Jun 16 '23 13:06 rouault