satpy Scene creation fails for too large number of files due to maximum recursion depth exceeded error

Describe the bug The creation of a scene for data from OLCI fails once the number of files provided via the filenames argument exceeds 3000. The error thrown is RecursionError: maximum recursion depth exceeded. This suggests that the files are read in by an iterative approach internally and not sequentially. This seems a rather unnecessary limitation of the number of files that can be read because an iterative solution could be easily avoided.

To Reproduce

from datetime import datetime
from satpy import Scene, find_files_and_readers

files = find_files_and_readers(sensor='olci',
                               start_time=datetime(2025, 11, 11, 0, 10),
                               end_time=datetime(2025, 11, 11, 20, 0),
                               base_dir='/path/to/data',
                               reader='olci_l1b')
print(len(files['olci_l1b']))

scn = Scene(filenames=files)

Environment Info:

OS: Linux
Satpy Version: 0.59.0

Nov 13 '25 15:11 phheinen

This is...interesting. That is a lot of files. So I have a couple points and couple questions:

Normally a Satpy Scene can only handle 1 orbit or 1 time step (in the case of geostationary data) of data. Otherwise in the case of polar-orbiting data (swath-based data) resampling will not behave as expected as resampling algorithms do not take time into account and will pick pixels based on location only. This produces awkward output when multiple orbits are blended together.
I'm surprised by this failing from recursion. Do you have the end of the traceback so we can try to track down and possibly fix this unexpected recursion?
The default number of files that can be opened by one process in linux is 1024 (ulimit -n) so even if there wasn't a recursion issue and the Satpy resampling handled it properly, you wouldn't be able to open that many files without modifying your system's settings to allow for that many.
What is your use case? What are you trying to accomplish? Maybe we can suggest a different approach?

Nov 13 '25 15:11 djhoese

Thank you for the quick reply. Here my answers to your points:

It is good to know that satpy blends the data from different orbits, I was already wondering what happens in the polar region for multiple orbits.

File ~/miniforge3/lib/python3.12/site-packages/satpy/scene.py:155, in Scene.__init__(self, filenames, reader, filter_parameters, reader_kwargs)
    152 if filenames:
    153     filenames = convert_remote_files_to_fsspec(filenames, storage_options)
--> 155 self._readers = self._create_reader_instances(filenames=filenames,
    156                                               reader=reader,
    157                                               reader_kwargs=cleaned_reader_kwargs)
    158 self._datasets = DatasetDict()
    159 self._wishlist = set()

File ~/miniforge3/lib/python3.12/site-packages/satpy/scene.py:176, in Scene._create_reader_instances(self, filenames, reader, reader_kwargs)
    171 def _create_reader_instances(self,
    172                              filenames=None,
    173                              reader=None,
    174                              reader_kwargs=None):
    175     """Find readers and return their instances."""
--> 176     return load_readers(filenames=filenames,
    177                         reader=reader,
    178                         reader_kwargs=reader_kwargs)

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/loading.py:65, in load_readers(filenames, reader, reader_kwargs)
     63 loadables = reader_instance.select_files_from_pathnames(readers_files)
     64 if loadables:
---> 65     reader_instance.create_storage_items(
     66             loadables,
     67             fh_kwargs=reader_kwargs_without_filter[None if reader is None else reader[idx]])
     68     reader_instances[reader_instance.name] = reader_instance
     69     remaining_filenames -= set(loadables)

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:618, in FileYAMLReader.create_storage_items(self, files, **kwargs)
    616 def create_storage_items(self, files, **kwargs):
    617     """Create the storage items."""
--> 618     return self.create_filehandlers(files, **kwargs)

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:643, in FileYAMLReader.create_filehandlers(self, filenames, fh_kwargs)
    636         self.file_handlers[filetype] = sorted(
    637             self.file_handlers.get(filetype, []) + filehandlers,
    638             key=lambda fhd: (fhd.start_time, fhd.filename))
    640 # Update dataset IDs with IDs determined dynamically from the file
    641 # and/or update any missing metadata that only the file knows.
    642 # Check if the dataset ID is loadable from that file.
--> 643 self.update_ds_ids_from_file_handlers()
    644 return created_fhs

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/yaml_reader.py:686, in FileYAMLReader.update_ds_ids_from_file_handlers(self)
    684 avail_datasets = self._file_handlers_available_datasets()
    685 new_ids = {}
--> 686 for is_avail, ds_info in avail_datasets:
    687     # especially from the yaml config
    688     coordinates = ds_info.get("coordinates")
    689     if isinstance(coordinates, list):
    690         # xarray doesn't like concatenating attributes that are
    691         # lists: https://github.com/pydata/xarray/issues/2060

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/file_handlers.py:275, in BaseFileHandler.available_datasets(self, configured_datasets)
    182 def available_datasets(self, configured_datasets=None):
    183     """Get information of available datasets in this file.
    184 
    185     This is used for dynamically specifying what datasets are available
   (...)    273 
    274     """
--> 275     for is_avail, ds_info in (configured_datasets or []):
    276         if is_avail is not None:
    277             # some other file handler said it has this dataset
    278             # we don't know any more information than the previous
    279             # file handler so let's yield early
    280             yield is_avail, ds_info

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/file_handlers.py:275, in BaseFileHandler.available_datasets(self, configured_datasets)
    182 def available_datasets(self, configured_datasets=None):
    183     """Get information of available datasets in this file.
    184 
    185     This is used for dynamically specifying what datasets are available
   (...)    273 
    274     """
--> 275     for is_avail, ds_info in (configured_datasets or []):
    276         if is_avail is not None:
    277             # some other file handler said it has this dataset
    278             # we don't know any more information than the previous
    279             # file handler so let's yield early
    280             yield is_avail, ds_info

    [... skipping similar frames: BaseFileHandler.available_datasets at line 275 (2969 times)]

File ~/miniforge3/lib/python3.12/site-packages/satpy/readers/core/file_handlers.py:275, in BaseFileHandler.available_datasets(self, configured_datasets)
    182 def available_datasets(self, configured_datasets=None):
    183     """Get information of available datasets in this file.
    184 
    185     This is used for dynamically specifying what datasets are available
   (...)    273 
    274     """
--> 275     for is_avail, ds_info in (configured_datasets or []):
    276         if is_avail is not None:
    277             # some other file handler said it has this dataset
    278             # we don't know any more information than the previous
    279             # file handler so let's yield early
    280             yield is_avail, ds_info

RecursionError: maximum recursion depth exceeded

Yes, but it should be possible to open and close the files one after another to extract data from them all, shouldn't it?
The use case is to create a daily (downsampled) geoprojected plot of all the data collected from OLCI during this day. This would be used for internal monitoring of the instrument status, because missing data from one of the OLCI cameras is very easy to spot in such a plot. We are anyways considering though as an alternative to create one plot for each orbit, or at least for only a few orbits.

Nov 13 '25 16:11 phheinen

Yes, I wouldn't recommend doing more than one orbit at a time even if the rest of this wasn't failing.
Very interesting. That isn't technically recursion in the normal sense, but rather a generator being passed to a generator being passed to a generator and so on. The hope with this implementation was to avoid generating and iterating over a list multiple times. A fix probably isn't too hard, but I'd have to consider pros and cons.
Yes, but the reading code would have to be very smart about how it does things. With dask, which Satpy is using, this gets even more complicated as the easiest solution is to open the file and let dask hold on to it until it needs the data and eventually loads it. Having dask open the file repeatedly in each thread that it needs it is typically not great for performance and is not the most obvious "out of the box" solution. You end up having to open the file in the main thread, parse out all the attributes and variables that you want/need to use, then pass the original filename to a function that dask will call later in a separate thread. In that function, you re-open the file, pull the data out, then close the file. Depending on the file format this gets even more difficult.
Sounds good. If I was doing this I would process one orbit at a time, save to a geotiff, then call something like gdal_merge on the command line to merge them all together. If you know the target area/extent that you want the final image to be on you could resample to that large grid with Satpy and that would speed things up for gdal_merge. Satpy also has a MultiScene which could do this blending of orbits for you, but honestly I'm not sure this is the right use case for it. Plus if you process an orbit at a time you can do it in parallel with simpler code or in real-time when the orbit data becomes available.

Nov 13 '25 16:11 djhoese

On point 4 I would resample it to a world map and then save it as a NetCDF. Then you can read them again with satpy, and as they would all have the same area, you could read all the orbit files for a day together and use some bucket averaging resampler or such to get the averages. With geotiff you can read images written with satpy, but you this is not designed to retain pixel values.

Nov 13 '25 16:11 gerritholl

@gerritholl I hadn't considered reading the geotiffs with Satpy and yes this could be troublesome (given other filed issues related to that generic_image reader). I offered geotiff as a suggestion because if the desired end result is a geotiff then creating that will be fast and easy to manipulate with GDAL (gdal_merge).

Nov 13 '25 16:11 djhoese

Just my 2 cents: we use a dynamic area in the right projection to resample each segment and the use a wms to put them all together (with viirs granules). A wms is not necessary in your case i guess a simple gdal merge should work if you just need a daily image.

Nov 13 '25 19:11 mraspaud