Cropping RSS native format data does not work.
Describe the bug Sorry, I messed the initial report up and reported on the wrong thing! New version below: For SEVIRI rapid scan native format data, cropping the scene does not work and results in the full RSS image area being output without any warning messages. This appears to be because the HRV and VIS/IR channel area definitions do not match even though, to the best of my knowledge, they should.
To Reproduce
from satpy import Scene
from glob import glob
import warnings
warnings.filterwarnings('ignore')
bbox = (-1773039, 3000000, 898791, 5109278)
fname = glob('./*.nat')
# This will produce a full RSS area image, despite cropping
scn = Scene(fname, reader='seviri_l1b_native', reader_kwargs={'fill_disk': False})
scn.load(['_hrv', 'colorized_ir_clouds'], upper_right_corner='NE')
scn2 = scn.crop(xy_bbox = bbox)
scn2.save_dataset('_hrv', filename='./test.tif', fill_value=0)
Replacing {'fill_disk': False} with {'fill_disk': True} produces the expected and correct output.
Expected behavior A correctly cropped image is saved.
Actual results
The full RSS image, instead of a cropped version, is output if fill_disk is False.
Can you show me the debug log for one of the broken cases?
Possibly related: https://github.com/pytroll/satpy/issues/2476
Sure. This code:
scn = Scene(glob('./*.nat'), reader='seviri_l1b_native')
scn.load(['_hrv', 'colorized_ir_clouds'], upper_right_corner='NE', reader_kwargs={'fill_disk': True})
scn2 = scn.crop(xy_bbox = bbox)
scn2.save_dataset('_hrv', filename='./test.tif', fill_value=0)
Produces this debug log:
[DEBUG: 2023-06-20 13:32:59 : satpy.readers.yaml_reader] Reading ('C:\\Users\\simon\\PycharmProjects\\satpy\\satpy\\etc\\readers\\seviri_l1b_native.yaml',)
[DEBUG: 2023-06-20 13:32:59 : satpy.readers.yaml_reader] Assigning to seviri_l1b_native: ['.\\MSG4-SEVI-MSG15-0100-NA-20230619131916.234000000Z-NA.nat']
[DEBUG: 2023-06-20 13:33:00 : native_msg] Calibration time 0:00:00.016791
[INFO: 2023-06-20 13:33:00 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-06-20 13:33:00 : satpy.readers.yaml_reader] Flipping Dataset unknown_name leftright.
[DEBUG: 2023-06-20 13:33:00 : native_msg] Calibration time 0:00:00.025792
[INFO: 2023-06-20 13:33:00 : satpy.readers.yaml_reader] Flipping Dataset unknown_name upsidedown.
[INFO: 2023-06-20 13:33:00 : satpy.readers.yaml_reader] Flipping Dataset unknown_name leftright.
[DEBUG: 2023-06-20 13:33:00 : satpy.modifiers.geometry] Applying sun zen correction
[DEBUG: 2023-06-20 13:33:00 : satpy.modifiers.geometry] Computing sun zenith angles.
[DEBUG: 2023-06-20 13:33:00 : satpy.modifiers.geometry] Apply the standard sun-zenith correction [1/cos(sunz)]
[DEBUG: 2023-06-20 13:33:00 : satpy.scene] Unloading dataset: DataID(name='HRV', wavelength=WavelengthRange(min=0.5, central=0.7, max=0.9, unit='µm'), resolution=1000.134348869, calibration=<calibration.reflectance>, modifiers=())
[DEBUG: 2023-06-20 13:33:00 : satpy.scene] Unloading dataset: DataID(name='IR_108', wavelength=WavelengthRange(min=9.8, central=10.8, max=11.8, unit='µm'), resolution=3000.403165817, calibration=<calibration.brightness_temperature>, modifiers=())
[DEBUG: 2023-06-20 13:33:00 : satpy.scene] Unloading dataset: DataID(name='HRV', wavelength=WavelengthRange(min=0.5, central=0.7, max=0.9, unit='µm'), resolution=1000.134348869, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',))
[DEBUG: 2023-06-20 13:33:00 : pyresample.geometry] Projections for data and slice areas are identical: PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["unknown",ELLIPSOID["unknown",6378169,295.488065897014,LENGTHUNIT["metre",1,ID["EPSG",9001]]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8901]]],CONVERSION["unknown",METHOD["Geostationary Satellite (Sweep Y)"],PARAMETER["Longitude of natural origin",9.5,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Satellite Height",35785831,LENGTHUNIT["metre",1,ID["EPSG",9001]]],PARAMETER["False easting",0,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1,ID["EPSG",9001]]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1,ID["EPSG",9001]]]]
[DEBUG: 2023-06-20 13:33:00 : pyresample.geometry] Projections for data and slice areas are identical: PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["unknown",ELLIPSOID["unknown",6378169,295.488065897014,LENGTHUNIT["metre",1,ID["EPSG",9001]]]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8901]]],CONVERSION["unknown",METHOD["Geostationary Satellite (Sweep Y)"],PARAMETER["Longitude of natural origin",9.5,ANGLEUNIT["degree",0.0174532925199433],ID["EPSG",8802]],PARAMETER["Satellite Height",35785831,LENGTHUNIT["metre",1,ID["EPSG",9001]]],PARAMETER["False easting",0,LENGTHUNIT["metre",1],ID["EPSG",8806]],PARAMETER["False northing",0,LENGTHUNIT["metre",1],ID["EPSG",8807]]],CS[Cartesian,2],AXIS["(E)",east,ORDER[1],LENGTHUNIT["metre",1,ID["EPSG",9001]]],AXIS["(N)",north,ORDER[2],LENGTHUNIT["metre",1,ID["EPSG",9001]]]]
[DEBUG: 2023-06-20 13:33:00 : satpy.writers] Reading ['C:\\Users\\simon\\PycharmProjects\\satpy\\satpy\\etc\\writers\\geotiff.yaml']
[DEBUG: 2023-06-20 13:33:00 : rasterio.session] Could not import boto3, continuing with reduced functionality.
[DEBUG: 2023-06-20 13:33:01 : satpy.writers] Adding enhancement configuration from file: C:\Users\simon\PycharmProjects\satpy\satpy\etc\enhancements\generic.yaml
[DEBUG: 2023-06-20 13:33:01 : satpy.writers] Adding enhancement configuration from file: C:\Users\simon\PycharmProjects\satpy\satpy\etc\enhancements\seviri.yaml
[DEBUG: 2023-06-20 13:33:01 : satpy.writers] Data for DataID(name='_hrv', resolution=1000.134348869) will be enhanced with options:
[{'name': 'stretch', 'method': <function stretch at 0x00000173C0CC7D90>, 'kwargs': {'stretch': 'crude', 'min_stretch': [0], 'max_stretch': [100]}}]
[DEBUG: 2023-06-20 13:33:01 : trollimage.xrimage] Applying stretch crude with parameters {'min_stretch': [0], 'max_stretch': [100]}
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Entering env context: <rasterio.env.Env object at 0x00000173B9736680>
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Starting outermost env
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] No GDAL environment exists
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] New GDAL environment <rasterio._env.GDALEnv object at 0x00000173B97365F0> created
[DEBUG: 2023-06-20 13:33:01 : rasterio._filepath] Installing FilePath filesystem handler plugin...
[DEBUG: 2023-06-20 13:33:01 : rasterio._env] GDAL_DATA found in environment.
[DEBUG: 2023-06-20 13:33:01 : rasterio._env] PROJ_DATA found in environment.
[DEBUG: 2023-06-20 13:33:01 : rasterio._env] Started GDALEnv: self=<rasterio._env.GDALEnv object at 0x00000173B97365F0>.
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Entered env context: <rasterio.env.Env object at 0x00000173B9736680>
[DEBUG: 2023-06-20 13:33:01 : rasterio._io] Path: _UnparsedPath(path='./test.tif'), mode: w, driver: GTiff
[DEBUG: 2023-06-20 13:33:01 : rasterio._io] Skipped delete for overwrite, dataset does not exist: './test.tif'
[DEBUG: 2023-06-20 13:33:01 : rasterio._base] Nodata success: 1, Nodata value: 0.000000
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Exiting env context: <rasterio.env.Env object at 0x00000173B9736680>
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Cleared existing <rasterio._env.GDALEnv object at 0x00000173B97365F0> options
[DEBUG: 2023-06-20 13:33:01 : rasterio._env] Stopped GDALEnv <rasterio._env.GDALEnv object at 0x00000173B97365F0>.
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Exiting outermost env
[DEBUG: 2023-06-20 13:33:01 : rasterio.env] Exited env context: <rasterio.env.Env object at 0x00000173B9736680>
```
So one thing I noticed while playing with this that I noticed: you are passing reader_kwargs to the Scene.load call. Those need to be specified during Scene creation.
If I run your exact code I get an error about "Can't compare areas of different types" during the crop because the hrv channel has a StackedAreaDefinition for my files (a single .nat file) and the colorized clouds is an AreaDefinition.
When I moved the reader_kwargs to the Scene creation I get the expected output as far as areas are concerned:
In [8]: scn = Scene(reader="seviri_l1b_native", filenames=glob("/data/satellite/seviri_native/*.nat"), reader_kwargs={"fill_disk": True})
In [9]: scn.load(["_hrv", "colorized_ir_clouds"], upper_right_corner="NE")
/home/davidh/repos/git/satpy/satpy/readers/seviri_base.py:387: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
dataset.coords['acq_time'] = ('y', acq_time)
/home/davidh/repos/git/satpy/satpy/readers/seviri_base.py:387: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
dataset.coords['acq_time'] = ('y', acq_time)
/home/davidh/repos/git/satpy/satpy/readers/seviri_base.py:387: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
dataset.coords['acq_time'] = ('y', acq_time)
/home/davidh/repos/git/satpy/satpy/readers/seviri_base.py:387: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
dataset.coords['acq_time'] = ('y', acq_time)
In [10]: scn2 = scn.crop(xy_bbox=bbox)
In [11]: scn2["_hrv"].attrs["area"]
Out[11]:
Area ID: msg_seviri_fes_1km
Description: MSG SEVIRI Full Earth Scanning service area definition with 1 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 2676
Number of rows: 2112
Area extent: (-1775738.5364, 2999902.9794, 900620.9812, 5112186.7242)
In [12]: scn2["colorized_ir_clouds"].attrs["area"]
Out[12]:
Area ID: msg_seviri_fes_3km
Description: MSG SEVIRI Full Earth Scanning service area definition with 3 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 892
Number of rows: 704
Area extent: (-1774738.4726, 2998902.9642, 901621.1513, 5111186.793)
In [13]: scn["_hrv"].attrs["area"]
Out[13]:
Area ID: msg_seviri_fes_1km
Description: MSG SEVIRI Full Earth Scanning service area definition with 1 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 11136
Number of rows: 11136
Area extent: (-5571248.3904, -5566247.7186, 5566247.7186, 5571248.3904)
In [14]: scn["colorized_ir_clouds"].attrs["area"]
Out[14]:
Area ID: msg_seviri_fes_3km
Description: MSG SEVIRI Full Earth Scanning service area definition with 3 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 3712
Number of rows: 3712
Area extent: (-5570248.4773, -5567248.0742, 5567248.0742, 5570248.4773)
I've now updated the initial question. The areas of the _hrv and colorized_ir_clouds composites are as below:
HRV
Area ID: msg_seviri_rss_1km
Description: MSG SEVIRI Rapid Scanning Service area definition with 1 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 5568
Number of rows: 4176
Area extent: (-2741868.3174, 1394687.3495, 2826879.7371, 5571248.3904)
CIR
Area ID: msg_seviri_rss_3km
Description: MSG SEVIRI Rapid Scanning Service area definition with 3 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 3712
Number of rows: 1392
Area extent: (-5570248.4773, 1393687.2705, 5567248.0742, 5570248.4773)
I'd like to note that although fill_disk fixes the problem, it's not the ideal solution as the areas should match and filling the disk means we're carrying around 60% empty data in the arrays, as a full disk image (which is produced by fill_disk=True is 3712 lines compared to 1392 for the actual rapid scan images.
It looks like the CIR composite is cropped (not sure if you said this already), but the HRV is unchanged:
In [9]: scn2["colorized_ir_clouds"].attrs["area"]
Out[9]:
Area ID: msg_seviri_rss_3km
Description: MSG SEVIRI Rapid Scanning Service area definition with 3 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 892
Number of rows: 704
Area extent: (-1774738.4726, 2998902.9642, 901621.1513, 5111186.793)
In [10]: scn2["_hrv"].attrs["area"]
Out[10]:
Area ID: msg_seviri_rss_1km
Description: MSG SEVIRI Rapid Scanning Service area definition with 1 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 5568
Number of rows: 4176
Area extent: (-2765871.5418, 1394687.3495, 2802876.5127, 5571248.3904)
Ah I figured it out:
In [12]: scn3 = scn.crop(xy_bbox=bbox, dataset_ids=["_hrv"])
In [13]: scn3["_hrv"].attrs["area"]
Out[13]:
Area ID: msg_seviri_rss_1km
Description: MSG SEVIRI Rapid Scanning Service area definition with 1 km resolution
Projection: {'a': '6378169', 'h': '35785831', 'lon_0': '9.5', 'no_defs': 'None', 'proj': 'geos', 'rf': '295.488065897014', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 2673
Number of rows: 2110
Area extent: (-1773738.2677, 2999902.9794, 899620.8468, 5110186.4555)
The crop call is choosing the lowest resolution area to base all its math on (determining intersections, calculating slice indexes, etc). It assumes that areas with the same CRS came from the same reader and refer to/represent the same geographic region. This is not true for HRV and the lower resolution SEVIRI bands that aren't filled in to be full disk.
So...Scene.crop takes the lowest resolution area, computes slices, and tries to slice things. But the slicing doesn't make sense for HRV. At least this is my current guess. Skimming through the code it doesn't quite look this way, but I'm still figuring it out. Have to go offline for a while now.
It kind of seems like this is unfinished functionality:
https://github.com/pytroll/satpy/blob/f44473f905fa07615f4f17e5b41657645a3ce7d1/satpy/scene.py#L745-L748