scn.available_dataset_ids() should indicate calibrations in a more user-friendly manner
Describe the bug
I used scn.available_dataset_ids() to inquire the available calibrations for a particular channel. Unfortunately, the result was not helpful, as it just indicated calibrations <1>, <3>, and <4>.
To Reproduce
from satpy import Scene
from glob import glob
filenames = glob("W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*.nc")
sc = Scene(filenames=filenames, reader="fci_l1c_nc")
print([x for x in sc.available_dataset_ids() if x["name"] == "vis_06"])
Expected behavior
I would expect parameters that I can pass directly to sc.load(["vis_06}', calibration=...), perhaps:
[DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration="counts", modifiers=()),
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration="radiance", modifiers=()),
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration="reflectance", modifiers=())]
Actual results
In reality, I get numbers that I cannot meaningfully use or interpret as a human:
[DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=()),
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<3>, modifiers=()),
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<4>, modifiers=())]
Environment Info:
- Satpy Version: 0.42.3.dev162+g89891b30e.d20230629
Additional context
I guess it's an enum behind the scenes and that the value rather than the description of the enum is represented.
Unless something is customized for FCI, this is supposed to be as you expect. It might show the enum, but I thought that enum had the string values. I wonder if I broke this when I worked on Scene serialization stuff a while ago. Any idea if it is this way (integers) for other readers?
It's the same for SEVIRI:
from satpy import Scene
from glob import glob
seviri_files = glob("/media/nas/x21308/scratch/SEVIRI/202103300900/H-000*")
sc = Scene(filenames={"seviri_l1b_hrit": seviri_files})
print(sc.available_dataset_ids())
also gives results with calibration=<1>, calibration=<3>, etc.
Yeah I see it for ABI too. What version of Python are you on?
Should be an easy fix:
https://github.com/pytroll/satpy/blob/6326b0358a8941cb509f3cd14f761d64807c9e3d/satpy/dataset/dataid.py#L89-L91
I'm on Python 3.11.4.
This is what it would show if we didn't have that __repr__:
In [14]: dataid.IntEnum.__repr__(data_id["calibration"])
Out[14]: '<calibration.reflectance: 1>'
Ah it's a Python 3.10 to 3.11 change, here is Python 3.10:
In [1]: from satpy import Scene; from glob import glob
scn
In [2]: scn = Scene(reader="abi_l1b", filenames=glob("/data/satellite/abi/20180711/OR_ABI-L1b-RadF-M3C02_G16_s20181921800440_e20181921811207_c20181921811242.nc"))
In [3]: scn.available_dataset_ids()
Out[3]:
[DataID(name='C02', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=()),
DataID(name='C02', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<calibration.radiance>, modifiers=()),
DataID(name='C02', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<calibration.counts>, modifiers=())]
And it is documented:
https://docs.python.org/3/library/enum.html#enum.IntEnum
so in that code snippet above, we should replace str(self) with repr(self), right?
Depends what we want exactly and how compatible it is between Python versions. Do we want it to be "reflectance"? Or "<reflectance>"? Or something else?
So I think if we do self.name in the __repr__ I referenced above this would fix it. @gerritholl Have time to make a pull request?
shouldn't we change __str__ rather than __repr__? I think str should be more user friendly, while repr is more for serializing
I don't care (nor do I necessarily know) what has to be changed, but I think we want an ipython session doing scn.available_dataset_ids() to show a DataID object with "reflectance" rather than <calibration.reflectance: 1>, right? ipython and other REPLs will be doing repr unless you do print which I think does __str__?