Confusing documentation for creating a Scene without a filename or reader
The documentation in the Scene code indicates that a Scene object can be created without a filename or reader. The documentation refers to a Dataset, which looks like the nomenclature used to refer to an xarray Dataset. However, the "Dataset" in
scn = Scene()
scn['my_dataset'] = Dataset(my_data_array, **my_info)
Is referring to a Dataset object that preceded the use of xarray in SatPy, and the documentation itself should have been purged.
I'm really confused by that example code. The old Dataset didn't have anything to do with xarray, but this code is passing a DataArray apparently.
The example code is taken from the Scene documentation. I am also confused by the documentation code. Not certain if the "Dataset" referred to a Dataset from xarray, I used both
scn['my_dataset'] = Dataset(xr.DataArray, **my_info)
and
scn['my_dataset'] = xr.Dataset
with a dtype in the attrs among other things like the crs and a dataset name.
from satpy import Scene
new_scene = Scene()
# Load a Scene (not worrying about datetime for now...) from GOES-18 (west) and GOES-16 (east)
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")
g16.load(["cloud_mask"])
g18.load(["cloud_mask"])
# Save the area definition of the GOES-18 (west) satellite for later.
area_def18 = g18["cloud_mask"].attrs["area"]
# These global attributes are saved so that it is easier to record the original date of the projection information used.
c18_attrs = g16["cloud_mask"].attrs
# Resample the GOES-16 data into the GOES-18 projection.
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")
# Mask the region where there is overlap.
g16_ol_in_g18space = xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0)
# Create a new scene.
new_scene["overlap_mask"] = Dataset(data_vars={'overlap': (['y', 'x'], g16_ol_in_g18space.data)},
coords=g18["cloud_mask"].coords,
attrs=dict(name="overlap_mask", long_name="overlap_mask", platform_name="GOES-18",
start_time=c18_attrs["start_time"].isoformat(), flag_values=[0, 1],
flag_meaning="0: No Overlap, 1: Overlap",
dtype=g16_ol_in_g18space.data.dtype))
Creates a Scene and a dataset that looks kind of right?
new_scene["overlap_mask"]
Out[22]:
<xarray.Dataset> Size: 60MB
Dimensions: (y: 1500, x: 2500)
Coordinates:
latitude (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
longitude (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
crs object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unkno...
* y (y) float64 12kB 4.588e+06 4.586e+06 ... 1.586e+06 1.584e+06
* x (x) float64 20kB -2.504e+06 -2.502e+06 ... 2.502e+06 2.504e+06
Data variables:
overlap (y, x) int64 30MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
Attributes:
name: overlap_mask
long_name: overlap_mask
platform_name: GOES-18
start_time: 2023-05-26T16:36:17.100000
flag_values: [0, 1]
flag_meaning: 0: No Overlap, 1: Overlap
dtype: int64
_satpy_id: DataID(name='overlap_mask')
However,
new_scene.save_datasets(filename="test.nc")
raises an exception (I am assuming because I am using an xarray Dataset, rather than the Dataset that was intended in the documentation):
/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py:274: UserWarning: dtype int64 not compatible with CF-1.7.
grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets, # list of xr.DataArray
Traceback (most recent call last):
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-20-22641c74dec4>", line 1, in <module>
new_scene.save_datasets(filename="test.nc")
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/scene.py", line 1293, in save_datasets
return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py", line 274, in save_datasets
grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets, # list of xr.DataArray
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 232, in collect_cf_datasets
ds = _collect_cf_dataset(
^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 137, in _collect_cf_dataset
new_dataarray = make_cf_data_array(new_dataarray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 79, in make_cf_data_array
dataarray = _preprocess_data_array_name(dataarray=dataarray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 49, in _preprocess_data_array_name
dataarray = dataarray.rename(new_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4282, in rename
return self._rename(name_dict=name_dict, **names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4217, in _rename
name_dict = either_dict_or_kwargs(name_dict, names, "rename")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/namedarray/utils.py", line 193, in either_dict_or_kwargs
raise ValueError(f"the first argument to .{func_name} must be a dictionary")
ValueError: the first argument to .rename must be a dictionary```
So a Scene should only ever contain DataArray objects except in I think a very small exception case (temporary container during MultiScene animation stuff). I mentioned on slack that there is a to_xarray method on the Scene which should give you a Dataset object that is xarray-savable with .to_netcdf() I think. That would be my first try for what you're doing, but regardless this documentation should be updated to use a DataArray and assigning to attrs= for the metadata unless I'm missing something.
Sorry, I had missed the "to_xarray" mention. to_xarray() works with to_netcdf() for me.
import xarray as xr
from satpy import Scene
from xarray import DataArray
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")
g16.load(["cloud_mask"])
g18.load(["cloud_mask"])
area_def18 = g18["cloud_mask"].attrs["area"]
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")
g18mask = Scene()
g18mask["overlap"] = DataArray(data=xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0),
dims=['y', 'x'],
attrs=dict(name="overlap_mask", long_name="G18FixedGrid_overlap_G16",
platform_name="GOES-18",
start_time=g16_ol_in_g18space["cloud_mask"].attrs["start_time"], flag_values=[0, 1],
flag_meaning="0: No Overlap, 1: Overlap",
area=area_def18))
g18mask.save_dataset(filename="{long_name}.nc", dataset_id="overlap")
works. So maybe it is just clarifying the old documentation?
Yeah it is likely as simple as changing Dataset to DataArray and using attrs= instead of ** on the info dictionary. Would you mind making a pull request?