geo-deep-learning icon indicating copy to clipboard operation
geo-deep-learning copied to clipboard

inference.py: keep raw heatmap to set confidence level for extracted features after post-processing

Open remtav opened this issue 3 years ago • 3 comments

Inference values for each before using argmax() function will be useful for technicians for sort extracted features by confidence levels. Our inference script should let use decide to output the raw heatmap (after softmax), not just the final one where argmax() has been applied

remtav avatar Dec 15 '21 20:12 remtav

Looks like a debug level setting : info, verbose, etc. Using that kind of terminology for command line arguments -- as opposed to something like heatmap or probability map -- would likely make the use of the script easier. It would be specified somewhere what the "output level" settings map to technically, e.g. verbose means heatmaps are saved together with the final segmentation results.

Heatmap values give indications to technicians as to potential feature type commissions. Would they need some intermediate result to help them with feature delineation too ? I suggest we have them document their needs in this ticket.

ymoisan avatar Dec 16 '21 14:12 ymoisan

Here's an idea for implementing this:

  1. write raw inference as raster, before argmax but after sigmoid/softmax, as .tif (currently it is written as .dat file with numpy memmap, and is deleted at end of inference)
  2. write the final inference as gpkg
  3. read final inference as geopandas dataframe.
  4. for each feature saved in gpkg:

4.1. get bounds of feature and from those bounds read rasterio window of the raw inference (with per-class, per-pixel confidence levels) 4.2 rasterize the single feature with its bounds (rectangular area) and create numpy mask from output (true where feature is, false where background) 4.3. using numpy, calculate the mean confidence for all pixels where mask values are True 4.4. write that mean value as integer attribute value for a "confidence" attribute

Step 4 could be parallelized using python multiprocessing and would speed up the whole process. See example of multiprocessing implementation in my solaris_tiling branch.

It goes without saying that, in our use case (semantic segmentation), this would be pretty "calculation" and "memory" intensive. Tests need to be done.

remtav avatar Mar 01 '22 19:03 remtav

Could we avoid vectorizing then re-rasterizing by writing a 2-channel raster with prediction class value in channel 1 and a confidence value in channel 2 ? We could use a flag (e.g. verbose=true ?) as a runtime parameter that would mean the raster output would be kept. That parameter would default to false.

ymoisan avatar Mar 01 '22 20:03 ymoisan