Creating composites using static background require double resampling when `generate=False` is used
Describe the bug
I'm trying to create the geo_color composite using FCI data and the built-in recipe. For some reason the Scene needs to be resampled twice for it to be saveable.
EDIT:
The key to the problem is using generate=False when composite(s) using data with different projections are loaded. Such as static background image with any satellite data.
To Reproduce
import glob
import hdf5plugin
from satpy import Scene
FNAMES = glob.glob("/home/lahtinep/data/satellite/geo/fci/*C_0088*nc")
def main():
"""Run."""
comps = ["geo_color"]
glbl = Scene(reader='fci_l1c_nc', filenames=FNAMES)
glbl.load(comps, generate=False)
lcl = glbl.resample("euro4", resampler="gradient_search", reduce_data=False)
# lcl2 = lcl.resample("euro4", resampler="gradient_search", reduce_data=False)
# lcl2.save_datasets(
lcl.save_datasets(
filename='/tmp/{start_time:%Y%m%d_%H%M}_MTG-I1_euro4_{name}.tif',
tiled=True, blockxsize=512, blockysize=512,
driver='COG', overviews=[])
if __name__ == "__main__":
main()
Expected behavior The image should be saved after a single resampling.
Actual results
RuntimeError: None of the requested datasets have been generated or could not be loaded. Requested composite inputs may need to have matching dimensions (eg. through resampling).
Environment Info:
- OS: Linux
- Satpy Version: current
mainbranch - PyResample Version: current
mainbranch
Additional context
If the lcl2 lines are taken to use, and thus the scene is resampled twice, the product is generated correctly.
Same problem is seen if I add night_ir_with_background composite (with sub-composites) to fci.yaml.
It seems that setting generate=True makes the composite to work. This also means a significant performance hit when producing multiple composites sharing channels (instead of single resampling per channel, all RGBs are resampled separately).
This doesn't seem to help: lcl = glbl.resample("euro4", resampler="gradient_search", reduce_data=False, generate=True)
Neither does lcl.generate_possible_composites(True) before saving, which outputs The following datasets were not created and may require resampling to be generated: DataID(name='geo_color').
Just for record keeping, can you print out glbl.keys() when generate=False? Then can you print out .attrs["area"] for everything in lcl.values() and compare? Or maybe do set([x.attrs["area"] for x in lcl.values()]) and print that out.
For the glbl Scene:
ipdb> for k in glbl.keys(): print(k)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='ir_38', wavelength=WavelengthRange(min=3.4, central=3.8, max=4.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='vis_04', wavelength=WavelengthRange(min=0.384, central=0.444, max=0.504, unit='µm'), resolution=1000, calibration=<1>, modifiers=())
DataID(name='vis_05', wavelength=WavelengthRange(min=0.47, central=0.51, max=0.55, unit='µm'), resolution=1000, calibration=<1>, modifiers=())
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=())
DataID(name='vis_08', wavelength=WavelengthRange(min=0.815, central=0.865, max=0.915, unit='µm'), resolution=1000, calibration=<1>, modifiers=())
ipdb> for v in set([x.attrs["area"] for x in glbl.values()]): print(v); print("")
Area ID: mtg_fci_fdss_1km
Description: MTG FCI Full Disk Scanning Service area definition with 1 km resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 11136
Number of rows: 11136
Area extent: (-5567999.9986, 5567999.9986, 5567999.9986, -5567999.9986)
Area ID: mtg_fci_fdss_500m
Description: MTG FCI Full Disk Scanning Service area definition with 500 m resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 22272
Number of rows: 22272
Area extent: (-5567999.9996, 5567999.9996, 5567999.9996, -5567999.9996)
and for lcl Scene:
ipdb> for k in lcl.keys(): print(k)
DataID(name='_geo_color_low_clouds_dep_0', resolution=1000)
DataID(name='_geo_color_low_clouds_dep_2')
DataID(name='_night_background_hires')
DataID(name='geo_color_high_clouds', resolution=1000)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='true_color', resolution=500)
ipdb> for v in set([x.attrs["area"] for x in lcl.values()]): print(v); print("")
Area ID: None
Description: None
Projection ID: WGS 84
Projection: {'datum': 'WGS84', 'no_defs': 'None', 'proj': 'longlat', 'type': 'crs'}
Number of columns: 13500
Number of rows: 6750
Area extent: (-180.0, -90.0, 180.0, 90.0)
Area ID: euro4
Description: Euro 4km area - Europe
Projection: {'ellps': 'bessel', 'lat_0': '90', 'lat_ts': '60', 'lon_0': '14', 'no_defs': 'None', 'proj': 'stere', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 1024
Number of rows: 1024
Area extent: (-2717181.7305, -5571048.1403, 1378818.2695, -1475048.1403)
So the static background isn't loaded with generate=False.
For generate=True it is (via _night_background_hires sub-composite):
ipdb> for k in glbl.keys(): print(k)
DataID(name='_geo_color_low_clouds_dep_0', resolution=1000)
DataID(name='_geo_color_low_clouds_dep_2')
DataID(name='_night_background_hires')
DataID(name='geo_color_high_clouds', resolution=1000)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='vis_04', wavelength=WavelengthRange(min=0.384, central=0.444, max=0.504, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_05', wavelength=WavelengthRange(min=0.47, central=0.51, max=0.55, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=('sunz_corrected', 'rayleigh_corrected'))
DataID(name='vis_08', wavelength=WavelengthRange(min=0.815, central=0.865, max=0.915, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))
ipdb> for v in set([x.attrs["area"] for x in glbl.values()]): print(v); print("")
Area ID: mtg_fci_fdss_1km
Description: MTG FCI Full Disk Scanning Service area definition with 1 km resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 11136
Number of rows: 11136
Area extent: (-5567999.9986, 5567999.9986, 5567999.9986, -5567999.9986)
Area ID: None
Description: None
Projection ID: WGS 84
Projection: {'datum': 'WGS84', 'no_defs': 'None', 'proj': 'longlat', 'type': 'crs'}
Number of columns: 13500
Number of rows: 6750
Area extent: (-180.0, -90.0, 180.0, 90.0)
Area ID: mtg_fci_fdss_500m
Description: MTG FCI Full Disk Scanning Service area definition with 500 m resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 22272
Number of rows: 22272
Area extent: (-5567999.9996, 5567999.9996, 5567999.9996, -5567999.9996)
I adjusted the title and added the short summary what triggers the behaviour to the description.
I tested by pre-resampling the background image to FCI projection and euro4, neither of which helped and a second resampling step is still required when generate=False is used.
Above I forgot to also resample the land-sea mask. If both background image and LSM are resampled to the final image projection, the composite generation works with a single resample() call even with scn.load(..., generate=False). I can use this as a background, but it'd be great to find out what causes this behaviour.
Do you still have this problem with my #2696 ? I don't know if this is a compositor issue or something else...
@yukaribbba yes, with #2696 I still need to do douple resampling when using generate=False.
I think the problem is somewhere in the decision tree in a case where the IncompatibleAreas is raised by a sub-composite.
Good discovery @pnuu!
Ok, this is me debugging and brainstorming:
-
StaticCompositor nodes shouldn't have any dependencies so they are treated like this:
https://github.com/pytroll/satpy/blob/33f354f9d6d8c9e62ca8f79d4cb60809d20b6235/satpy/dependency_tree.py#L563-L567
The idea is that even though the static compositor doesn't have any dependencies we need to add a fake empty node as a dependency. Otherwise this compositor will get treated like a reader-loaded dataset when we start loading things by doing
dependency_tree.leaves()anddependency_tree.trunk(). That is, composites and modified datasets are trunk nodes, reader datasets are leaves. -
Trunk nodes aren't accessed anywhere in the Scene for loading except for here (ignore "available_composite_ids", etc):
https://github.com/pytroll/satpy/blob/f11db543b2d42e634ddc6a8c935d17094ea1bc4c/satpy/scene.py#L1532-L1537
But this isn't called when
generate=False. -
By saying
generate=False, you're basically saying "don't look at composites at all" as an optimization essentially.
So the questions/choices are:
- Can we fix this by making static compositor nodes (or composites with no dependencies) leaf nodes in the dependency tree and load them like reader datasets?
- Does doing the above remove one of the expectations and intentions of users by using the
generate=Falsekeyword argument? - The above solution would also require changes to
available_Xmethods as they assume trunk nodes are composites and leaf nodes are from the reader. Gets a little confusing to explain to users what we mean if we start mixing these up. - Alternatively, should we (the Scene) look for trunk nodes with no real dependencies even when generate=False?
- Other solutions?
* Other solutions?
I think the easiest route, at least for my own worklfow, is to pre-resample the background images (black marble and land-sea mask) to my target projection and adjust the composite configs in $SATPY_CONFIG_PATH/composites/<instrument>.yaml to point url to the resampled versions. This means I'll need a separate variant for each target area, but for me that's ok since outside of testing I pretty much only ever use one or two areas.
FWIW, I have encountered this problem just now with one of my own composites...BUT, the static image background is on the same projection / area as the rest of the data.
@simonrp84 so scn[chan].attrs['area'] == scn2['image'].attrs['area'] is True?