satpy icon indicating copy to clipboard operation
satpy copied to clipboard

Reading ahi_l2_nc in s3

Open meteodave opened this issue 3 months ago • 7 comments

Describe the bug The "ahi_l2_nc" reader is attempting to open a file locally instead of in the s3 location.

To Reproduce

from satpy import Scene
from satpy.utils import debug_on; debug_on()

storage_options = {'anon': True}
filename = ["s3://noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc"]
scn = Scene(filenames=filename,reader='ahi_l2_nc',reader_kwargs={'storage_options': storage_options})

Expected behavior

The s3 object should be read into the satpy scene object.

Actual results

[DEBUG: 2025-09-15 17:58:28 : asyncio] Using selector: EpollSelector
[DEBUG: 2025-09-15 17:58:28 : satpy.readers.core.yaml_reader] Reading ('/home/jovyan/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/etc/readers/ahi_l2_nc.yaml', '/home/jovyan/tropics-tobac_flow/tobac-flow-1.8.2/ahi_l2_nc.yaml')
[DEBUG: 2025-09-15 17:58:28 : satpy.readers.core.yaml_reader] Assigning to ahi_l2_nc: [<FSFile "noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc">]
[DEBUG: 2025-09-15 17:58:29 : h5py._conv] Creating converter from 7 to 5
[DEBUG: 2025-09-15 17:58:29 : h5py._conv] Creating converter from 5 to 7
[DEBUG: 2025-09-15 17:58:29 : h5py._conv] Creating converter from 7 to 5
[DEBUG: 2025-09-15 17:58:29 : h5py._conv] Creating converter from 5 to 7
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/jovyan/tropics-tobac_flow/tobac-flow-1.8.2/noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'bf166986-1f3b-4200-90fa-47aea971a30e']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[1], line 6
      4 storage_options = {'anon': True}
      5 filename = ["s3://noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc"]
----> 6 scn = Scene(filenames=filename,reader='ahi_l2_nc',reader_kwargs={'storage_options': storage_options})

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/scene.py:155, in Scene.__init__(self, filenames, reader, filter_parameters, reader_kwargs)
    152 if filenames:
    153     filenames = convert_remote_files_to_fsspec(filenames, storage_options)
--> 155 self._readers = self._create_reader_instances(filenames=filenames,
    156                                               reader=reader,
    157                                               reader_kwargs=cleaned_reader_kwargs)
    158 self._datasets = DatasetDict()
    159 self._wishlist = set()

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/scene.py:176, in Scene._create_reader_instances(self, filenames, reader, reader_kwargs)
    171 def _create_reader_instances(self,
    172                              filenames=None,
    173                              reader=None,
    174                              reader_kwargs=None):
    175     """Find readers and return their instances."""
--> 176     return load_readers(filenames=filenames,
    177                         reader=reader,
    178                         reader_kwargs=reader_kwargs)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/loading.py:65, in load_readers(filenames, reader, reader_kwargs)
     63 loadables = reader_instance.select_files_from_pathnames(readers_files)
     64 if loadables:
---> 65     reader_instance.create_storage_items(
     66             loadables,
     67             fh_kwargs=reader_kwargs_without_filter[None if reader is None else reader[idx]])
     68     reader_instances[reader_instance.name] = reader_instance
     69     remaining_filenames -= set(loadables)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:618, in FileYAMLReader.create_storage_items(self, files, **kwargs)
    616 def create_storage_items(self, files, **kwargs):
    617     """Create the storage items."""
--> 618     return self.create_filehandlers(files, **kwargs)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:630, in FileYAMLReader.create_filehandlers(self, filenames, fh_kwargs)
    628 # load files that we know about by creating the file handlers
    629 for filetype, filetype_info in self.sorted_filetype_items():
--> 630     filehandlers = self._new_filehandlers_for_filetype(filetype_info,
    631                                                        filename_set,
    632                                                        fh_kwargs=fh_kwargs)
    634     if filehandlers:
    635         created_fhs[filetype] = filehandlers

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:613, in FileYAMLReader._new_filehandlers_for_filetype(self, filetype_info, filenames, fh_kwargs)
    609 filehandler_iter = self._new_filehandler_instances(filetype_info,
    610                                                    filename_iter,
    611                                                    fh_kwargs=fh_kwargs)
    612 filtered_iter = self.filter_fh_by_metadata(filehandler_iter)
--> 613 return list(filtered_iter)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:595, in FileYAMLReader.filter_fh_by_metadata(self, filehandlers)
    593 def filter_fh_by_metadata(self, filehandlers):
    594     """Filter out filehandlers using provide filter parameters."""
--> 595     for filehandler in filehandlers:
    596         filehandler.metadata["start_time"] = filehandler.start_time
    597         filehandler.metadata["end_time"] = filehandler.end_time

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:591, in FileYAMLReader._new_filehandler_instances(self, filetype_info, filename_items, fh_kwargs)
    588     warnings.warn(str(err) + " for {}".format(filename), stacklevel=4)
    589     continue
--> 591 yield filetype_cls(filename, filename_info, filetype_info, *req_fh, **fh_kwargs)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/readers/ahi_l2_nc.py:67, in HIML2NCFileHandler.__init__(self, filename, filename_info, filetype_info)
     65 """Initialize the reader."""
     66 super().__init__(filename, filename_info, filetype_info)
---> 67 self.nc = xr.open_dataset(self.filename,
     68                           decode_cf=True,
     69                           mask_and_scale=False,
     70                           chunks={"xc": "auto", "yc": "auto"})
     72 # Check that file is a full disk scene, we don't know the area for anything else
     73 if self.nc.attrs["cdm_data_type"] != EXPECTED_DATA_AREA:

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/api.py:760, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    748 decoders = _resolve_decoders_kwargs(
    749     decode_cf,
    750     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)    756     decode_coords=decode_coords,
    757 )
    759 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 760 backend_ds = backend.open_dataset(
    761     filename_or_obj,
    762     drop_variables=drop_variables,
    763     **decoders,
    764     **kwargs,
    765 )
    766 ds = _dataset_from_backend_dataset(
    767     backend_ds,
    768     filename_or_obj,
   (...)    779     **kwargs,
    780 )
    781 return ds

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:682, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, auto_complex, lock, autoclose)
    660 def open_dataset(
    661     self,
    662     filename_or_obj: T_PathFileOrDataStore,
   (...)    679     autoclose=False,
    680 ) -> Dataset:
    681     filename_or_obj = _normalize_path(filename_or_obj)
--> 682     store = NetCDF4DataStore.open(
    683         filename_or_obj,
    684         mode=mode,
    685         format=format,
    686         group=group,
    687         clobber=clobber,
    688         diskless=diskless,
    689         persist=persist,
    690         auto_complex=auto_complex,
    691         lock=lock,
    692         autoclose=autoclose,
    693     )
    695     store_entrypoint = StoreBackendEntrypoint()
    696     with close_on_error(store):

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:468, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, auto_complex, lock, lock_maker, autoclose)
    464     kwargs["auto_complex"] = auto_complex
    465 manager = CachingFileManager(
    466     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    467 )
--> 468 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:398, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    396 self._group = group
    397 self._mode = mode
--> 398 self.format = self.ds.data_model
    399 self._filename = self.ds.filepath()
    400 self.is_remote = is_remote_uri(self._filename)

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:477, in NetCDF4DataStore.ds(self)
    475 @property
    476 def ds(self):
--> 477     return self._acquire()

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:471, in NetCDF4DataStore._acquire(self, needs_lock)
    470 def _acquire(self, needs_lock=True):
--> 471     with self._manager.acquire_context(needs_lock) as root:
    472         ds = _nc4_require_group(root, self._group, self._mode)
    473     return ds

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File ~/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2521, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2158, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/tropics-tobac_flow/tobac-flow-1.8.2/noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc'

Screenshots

Environment Info:

  • OS: Linux-6.1.147-172.266.amzn2023.x86_64-x86_64-with-glibc2.35
  • Satpy Version: 0.58.0
  • PyResample Version: 1.34.2
  • Readers and writers dependencies (when relevant): [run from satpy.utils import check_satpy; check_satpy()]
Readers
=======
[DEBUG: 2025-09-15 17:40:45 : pyorbital.tlefile] Path to the Pyorbital configuration (where e.g. platforms.txt is found): /home/jovyan/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/pyorbital/etc
abi_l1b:  ok
abi_l1b_scmi:  ok
abi_l2_nc:  ok
acspo:  ok
agri_fy4a_l1:  ok
agri_fy4b_l1:  ok
ahi_hrit:  ok
ahi_hsd:  ok
ahi_l1b_gridded_bin:  ok
ahi_l2_nc:  ok
ami_l1b:  ok
amsr2_l1b:  ok
amsr2_l2:  ok
amsr2_l2_gaasp:  ok
amsub_l1c_aapp:  ok
ascat_l2_soilmoisture_bufr:  cannot find module 'satpy.readers.ascat_l2_soilmoisture_bufr' (('Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes.\n           Error: ', ModuleNotFoundError("No module named 'eccodes'")))
atms_l1b_nc:  ok
atms_sdr_hdf5:  ok
avhrr_l1b_aapp:  ok
avhrr_l1b_eps:  ok
avhrr_l1b_gaclac:  cannot find module 'satpy.readers.avhrr_l1b_gaclac' (No module named 'pygac')
avhrr_l1b_hrpt:  ok
avhrr_l1c_eum_gac_fdr_nc:  ok
aws1_mwr_l1b_nc:  ok
aws1_mwr_l1c_nc:  ok
caliop_l2_cloud:  cannot find module 'satpy.readers.caliop_l2_cloud' (No module named 'pyhdf')
camel_l3_nc:  ok
clavrx:  cannot find module 'satpy.readers.clavrx' (No module named 'pyhdf')
cmsaf-claas2_l2_nc:  ok
electrol_hrit:  ok
epic_l1b_h5:  ok
eps_sterna_mwr_l1b_nc:  ok
fci_l1c_nc:  ok
fci_l2_bufr:  cannot find module 'satpy.readers.eum_l2_bufr' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
fci_l2_grib:  cannot find module 'satpy.readers.eum_l2_grib' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
fci_l2_nc:  ok
fy3a_mersi1_l1b:  ok
fy3b_mersi1_l1b:  ok
fy3c_mersi1_l1b:  ok
generic_image:  ok
geocat:  ok
gerb_l2_hr_h5:  ok
ghi_l1:  ok
ghrsst_l2:  ok
gld360_ualf2:  ok
glm_l2:  ok
gms5-vissr_l1b:  ok
goci2_l2_nc:  ok
goes-imager_hrit:  ok
goes-imager_nc:  ok
gpm_imerg:  ok
grib:  cannot find module 'satpy.readers.grib' (No module named 'pygrib')
hsaf_grib:  cannot find module 'satpy.readers.hsaf_grib' (No module named 'pygrib')
hsaf_h5:  ok
hy2_scat_l2b_h5:  ok
iasi_l2:  ok
iasi_l2_cdr_nc:  ok
iasi_l2_so2_bufr:  cannot find module 'satpy.readers.iasi_l2_so2_bufr' (('Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes.\n           Error: ', ModuleNotFoundError("No module named 'eccodes'")))
iasi_ng_l2_nc:  ok
ici_l1b_nc:  ok
insat3d_img_l1b_h5:  ok
jami_hrit:  ok
li_l2_nc:  ok
maia:  ok
mcd12q1:  cannot find module 'satpy.readers.mcd12q1' (No module named 'pyhdf')
meris_nc_sen3:  ok
mersi2_l1b:  ok
mersi3_l1b:  ok
mersi_ll_l1b:  ok
mersi_rm_l1b:  ok
mhs_l1c_aapp:  ok
mimicTPW2_comp:  ok
mirs:  ok
modis_l1b:  cannot find module 'satpy.readers.modis_l1b' (No module named 'pyhdf')
modis_l2:  cannot find module 'satpy.readers.modis_l2' (No module named 'pyhdf')
modis_l3:  cannot find module 'satpy.readers.modis_l3' (No module named 'pyhdf')
msi_l1c_earthcare:  ok
msi_safe:  ok
msi_safe_l2a:  ok
msu_gsa_l1b:  ok
mtsat2-imager_hrit:  ok
multiple_sensors_isccpng_l1g_nc:  ok
mviri_l1b_fiduceo_nc:  ok
mwi_l1b_nc:  ok
mws_l1b_nc:  ok
nucaps:  ok
nwcsaf-geo:  ok
nwcsaf-msg2013-hdf5:  ok
nwcsaf-pps_nc:  ok
oceancolorcci_l3_nc:  ok
oci_l2_bgc:  cannot find module 'satpy.readers.seadas_l2' (No module named 'pyhdf')
olci_l1b:  ok
olci_l2:  ok
oli_tirs_l1_tif:  ok
omps_edr:  ok
osisaf_nc:  ok
pace_oci_l1b_nc:  ok
safe_sar_l2_ocn:  ok
sar-c_safe:  ok
satpy_cf_nc:  ok
scatsat1_l2b:  cannot find module 'satpy.readers.scatsat1_l2b' (cannot import name 'Dataset' from 'satpy.dataset' (/home/jovyan/conda_virtual/tobac-1.6.1/lib/python3.12/site-packages/satpy/dataset/__init__.py))
seadas_l2:  cannot find module 'satpy.readers.seadas_l2' (No module named 'pyhdf')
seviri_l1b_hrit:  ok
seviri_l1b_icare:  cannot find module 'satpy.readers.seviri_l1b_icare' (No module named 'pyhdf')
seviri_l1b_native:  ok
seviri_l1b_nc:  ok
seviri_l2_bufr:  cannot find module 'satpy.readers.eum_l2_bufr' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
seviri_l2_grib:  cannot find module 'satpy.readers.eum_l2_grib' (Missing eccodes-python and/or eccodes C-library installation. Use conda to install eccodes)
sgli_l1b:  ok
slstr_l1b:  ok
smos_l2_wind:  ok
tropomi_l2:  ok
vii_l1b_nc:  ok
vii_l2_nc:  ok
viirs_compact:  ok
viirs_edr:  ok
viirs_edr_active_fires:  ok
viirs_edr_flood:  cannot find module 'satpy.readers.viirs_edr_flood' (No module named 'pyhdf')
viirs_l1b:  ok
viirs_l2:  ok
viirs_sdr:  ok
viirs_vgac_l1c_nc:  ok
virr_l1b:  ok

Writers
=======
awips_tiled:  ok
cf:  ok
geotiff:  ok
mitiff:  ok
ninjogeotiff:  ok
ninjotiff:  cannot find module 'satpy.writers.ninjotiff' (No module named 'pyninjotiff')
simple_image:  ok

Versions
======
platform: Linux-6.1.147-172.266.amzn2023.x86_64-x86_64-with-glibc2.35
python: 3.12.11

cartopy: 0.25.0
dask: 2025.7.0
fsspec: 2025.9.0
gdal: not installed
geoviews: not installed
h5netcdf: 1.6.4
h5py: 3.14.0
netcdf4: 1.7.2
numpy: 2.2.6
pyhdf: not installed
pyproj: 3.7.2
rasterio: 1.4.3
xarray: 2025.9.0

Additional context Attempting to simultaneously load AHI L2 Cloud Height to use for parallax correction for AHI L1B Brightness temperatures.

meteodave avatar Sep 15 '25 18:09 meteodave

So that reader might not have as many options for reading remote data as it isn't using some of the helper functions necessary. For example the ABI readers use:

https://github.com/pytroll/satpy/blob/403b6574f541133c5a795422881f9d4fc769d9bb/satpy/readers/core/abi.py#L68

But this AHI L2 NC reader is using open_dataset directly:

https://github.com/pytroll/satpy/blob/403b6574f541133c5a795422881f9d4fc769d9bb/satpy/readers/ahi_l2_nc.py#L67-L71

Someone more familiar with the reader and remote reading in Satpy could maybe have a better idea, but I think for support of S3 remote files in this reader we'd need a small PR.

djhoese avatar Sep 15 '25 18:09 djhoese

Thank you for the quick response. I used the abi example to modify the ahi_l2_nc reader so I can open the file directly by s3.

In ahi_l2_nc.py

      ...
import pathlib
      ...
#copied from abi.py
def open_file_or_filename(unknown_file_thing, mode=None):  
   """Try to open the provided file "thing" if needed, otherwise return the filename or Path.
    
   This wraps the logic of getting something like an fsspec OpenFile object
   that is not directly supported by most reading libraries and making it
   usable. If a :class:`pathlib.Path` object or something that is not
   open-able is provided then that object is passed along. In the case of
   fsspec OpenFiles their ``.open()`` method is called and the result returned.
 
   """
   if isinstance(unknown_file_thing, pathlib.Path):
       f_obj = unknown_file_thing
   else:
       try:
           if mode is None:
               f_obj = unknown_file_thing.open()
           else:
               f_obj = unknown_file_thing.open(mode=mode)
       except AttributeError:
           f_obj = unknown_file_thing
   return f_obj
     ...

class HIML2NCFileHandler(BaseFileHandler):
       ...
        
        #self.nc = xr.open_dataset(self.filename,
        #                          decode_cf=True,
        #                          mask_and_scale=False,
        #                          chunks={"xc": "auto", "yc": "auto"})

        f_obj = open_file_or_filename(self.filename)
        
        self.nc = xr.open_dataset(f_obj,
                                 decode_cf=True,
                                 mask_and_scale=False,
                                 chunks={"xc": "auto", "yc": "auto"})  #note that the "auto" only option used by abi.py does not work

      ...

I use this code:

from satpy import Scene

storage_options = {'anon': True}
filename = ["s3://noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc"]
scn = Scene(filenames=filename,reader='ahi_l2_nc',reader_kwargs={'storage_options': storage_options})
scn.load(["cloud_top_height"])
cth = scn["cloud_top_height"]
print(cth)

to get this:

<xarray.DataArray 'CldTopHght' (y: 5500, x: 5500)> Size: 121MB
dask.array<open_dataset-CldTopHght, shape=(5500, 5500), dtype=float32, chunksize=(200, 200), chunktype=numpy.ndarray>
Coordinates:
    crs      object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["unknown...
  * y        (y) float64 44kB 5.499e+06 5.497e+06 ... -5.497e+06 -5.499e+06
  * x        (x) float64 44kB -5.499e+06 -5.497e+06 ... 5.497e+06 5.499e+06
Attributes: (12/15)
    long_name:            Cloud Top Height
    _FillValue:           -999.0
    valid_range:          [ -300. 20000.]
    units:                Meter
    sensor:               ahi
    platform_name:        Himawari-9
    ...                   ...
    name:                 cloud_top_height
    modifiers:            ()
    reader:               ahi_l2_nc
    area:                 Area ID: Himawari_Area\nDescription: AHI Full Disk ...
    _satpy_id:            DataID(name='cloud_top_height', modifiers=())
    ancillary_variables:  []

meteodave avatar Sep 15 '25 19:09 meteodave

When attempting to use the data though, it appears to be fetching too many times:

...
[DEBUG: 2025-09-16 16:46:26 : s3fs] Fetch: noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc, 225658590-278199454
[DEBUG: 2025-09-16 16:46:27 : fsspec] <File-like object S3FileSystem, noaa-himawari9/AHI-L2-FLDK-Clouds/2025/09/01/0000/AHI-CHGT_v1r1_h09_s202509010000209_e202509010009403_c202509010016093.nc> read: 225658590 - 225770654  , readahead: 166 hits, 124 misses, 6509577130 total requested bytes
...

So some more optimization seems to be needed.

meteodave avatar Sep 16 '25 17:09 meteodave

Looks like that is requesting a different (but overlapping) byte range of the S3 object. There are options you can add to the URL I believe to tell fsspec to cache the file locally. There are improvements being made by @mraspaud on trying to use a UPath object instead of some of the custom Satpy FSFile object, but that doesn't change how the underlying library interacts with the file. I don't remember the current state of NetCDF4's (the C library) support for S3 URLs being passed directly to it, but that may be what we need to allow in Satpy to get the best performance. That or use/leverage kerchunk and other similar packages to pre-define the byte ranges for certain information in a NetCDF file.

djhoese avatar Sep 16 '25 17:09 djhoese

@ghiggi has also done a lot of testing in the past on profiling various remote file access methods.

djhoese avatar Sep 16 '25 17:09 djhoese

I think I wrote this reader originally, and didn't include fsspec support as it was in its infancy then and, to be honest, I didn't understand it. Adding support shouldn't be that hard but my time is very limited just now. Maybe you can give it a go? Otherwise I can take a look whenever I get the time (probably November).

simonrp84 avatar Sep 16 '25 18:09 simonrp84

A common thing to do when working with remote files is to enable caching, to avoid redownloading things. See the fsspec documentation on how to do that.

mraspaud avatar Sep 17 '25 09:09 mraspaud