satpy icon indicating copy to clipboard operation
satpy copied to clipboard

MTG FCI high memory consumption when loading single channels

Open endutra opened this issue 7 months ago • 5 comments

While testing satpy to load MTG FCI images I came across a rather high memory usage of satpy. In the example below satpy uses about 2.4Gb to load a single channel going to 2.9Gb loading 4 channels. A simple reader using h5py uses only 285Mb (and it's much faster).

Is there some configuration that I'm missing or this is a "feature" of the reader to cope with the high level interface and other functionalities ?

Also need to force gc.collect() in the end of a function using satpy scene otherwise I get segmentation faults in a subsquent call to that function.

Using satpy version 0.51.0, python 3.12.6 , conda installation

## Test to load FCI with satpy and h5py

# Run tests in bash 
"""
# check functions return identical results
python3 mtg_read_test.py compare

# memory usage (twice to be sure)
for tt in h5py satpy satpyALL
do
    /usr/bin/time  -f "Max_Memory (Kb) $tt: %M" python3 mtg_read_test.py $tt
    /usr/bin/time  -f "Max_Memory (Kb) $tt: %M" python3 mtg_read_test.py $tt
done

"""


def load_satpy_single(channel,calibration):

    # function to load single channel/calibration from FCI using satpy

    import hdf5plugin
    from satpy.scene import Scene
    import gc


    # load scene
    t0 = time.time()
    c = Scene(filenames=file_list,reader=['fci_l1c_nc'])
    c.load([channel], calibration=calibration)

    # get value
    x_ = c[channel].values

    print(f"Loaded satpy {channel} {calibration} in {time.time()-t0:.2f} sec {x_.dtype},{x_.shape}")

    ## clean Scene otherwise get Segmentation fault in a second call ...
    del c
    gc.collect()

    return x_

def load_satpy_multiple(channel,calibration):

    # function to load multiple channel/calibration from FCI using satpy

    import hdf5plugin
    from satpy.scene import Scene
    import gc


    # load scene
    t0 = time.time()
    c = Scene(filenames=file_list,reader=['fci_l1c_nc'])
    c.load(channel, calibration=calibration)

    # get value
    x_ = {}
    for chn in channel:
        x_[chn] = c[chn].values

    print(f"Loaded satpy {channel} {calibration} in {time.time()-t0:.2f} sec {x_[channel[0]].dtype},{x_[channel[0]].shape}")

    ## clean Scene otherwise get Segmentation fault in a second call ...
    del c
    gc.collect()

    return x_


def load_fci_channel(chn,calibration='radiance'):

    # funtion to load single channel/calibration from FCI using h5py

    import h5py
    import numpy as np
    import warnings

    def fci_radiance_to_bt(xdata,attr):
        n = attr['radiance_to_bt_conversion_coefficient_wavenumber']
        a = attr['radiance_to_bt_conversion_coefficient_a']
        b = attr['radiance_to_bt_conversion_coefficient_b']
        c1 = attr['radiance_to_bt_conversion_constant_c1']
        c2 = attr['radiance_to_bt_conversion_constant_c2']

        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            brt = (c2*n) / (a * np.log(
                  1 + (c1 * n**3 / xdata)
                  )) - b / a
        return brt

    def fci_counts_to_radiance(xdata,attr):
        return np.where(xdata != attr['fillvalue'],
                   (xdata * attr['scale_factor']) + attr['add_offset'],
                    np.nan)

    rvar = 'effective_radiance'

    t0=time.time()
    group = f"data/{chn}/measured"

    for ik,ff in enumerate(file_list):
        t_ = time.time()

        f = h5py.File(ff,'r')

        if ik == 0 :
            # first file, find dimensions and allocate full image

            ncol = f[f'{group}/x'].size
            fillvalue = f[f'{group}/{rvar}'].attrs['_FillValue'][0]
            vrange = f[f'{group}/{rvar}'].attrs['valid_range']

            # initialize image to fillvalue assuming x size == y size..
            if calibration == 'counts':
                # initialize with native dtype
                xdisk = np.zeros((ncol,ncol),dtype=f[f'{group}/{rvar}'].dtype) + fillvalue
            else:
                # initialize as 'f4'
                xdisk = np.zeros((ncol,ncol),dtype='f4') + fillvalue

            # Get other attributes
            attrs = {}
            for key in  ['channel_effective_solar_irradiance',
                         'radiance_to_bt_conversion_coefficient_a',
                         'radiance_to_bt_conversion_coefficient_b',
                         'radiance_to_bt_conversion_coefficient_wavenumber',
                         'radiance_to_bt_conversion_constant_c1',
                         'radiance_to_bt_conversion_constant_c2',
                         'radiance_unit_conversion_coefficient',]:
                attrs[key] = f[f'{group}/{key}'][()]
            attrs['fillvalue'] = fillvalue
            attrs['scale_factor'] =  f[f'{group}/{rvar}'].attrs['scale_factor'][0]
            attrs['add_offset'] =  f[f'{group}/{rvar}'].attrs['add_offset'][0]
            attrs['channel'] = chn
            attrs['variable'] = rvar


        # get raw data
        x_ = f[f'{group}/{rvar}'][:]

        # get positions
        start_row = f[f'{group}/start_position_row'][()] - 1
        start_col = f[f'{group}/start_position_column'][()] - 1


        # set correct fill_value for values outside valid range
        x_[(x_ < vrange[0]) | (x_ > vrange[1]) ] = fillvalue


        if calibration == "radiance":
            x_ = fci_counts_to_radiance(x_,attrs)
        elif calibration == "brightness_temperature":
            x_ = fci_radiance_to_bt(counts_to_radiance(x_,attrs),attrs)

        # fill image
        xdisk[start_row: start_row + x_.shape[0],
              start_col: start_col + x_.shape[1]] = x_[:]

        f.close()

    print(f"h5py read {chn} {calibration} in {time.time()-t0:.2f} sec {xdisk.dtype} {xdisk.shape}")
    return xdisk, attrs


from glob import glob
import time
import sys
import numpy as np

## Get list of FCI Files for a repeat cycle
repeat_cycle = "0075"
dir_fci = f"/NFS13/Operational/RAW_DATA/MTG/2025/06/02/"
file_list = glob(f"{dir_fci}/*BODY*{repeat_cycle}_00??.nc")


## MTG Load
channels=['ir_105','ir_123','ir_133','ir_38']
calibration = 'radiance'

if sys.argv[1] == "compare":
    # loop on channel to check if both functions return identical arrays
    for chn in channels:
        xSPY = load_satpy_single(chn,calibration)
        xH5P , attr = load_fci_channel(chn,calibration)
        print(f"Arrays equal: {np.allclose(xSPY,xH5P,equal_nan=True)}, #pixel diff>0 {np.sum(np.abs(xSPY-xH5P)>0)}")

if sys.argv[1] == "satpy":
    # satpy load for memory usage check
    for chn in channels:
        xSPY = load_satpy_single(chn,calibration)

if sys.argv[1] == "satpyALL":
    # satpy load for memory usage check all channels
        xSPY = load_satpy_multiple(channels,calibration)

if sys.argv[1] == "h5py":
    # satpy load for memory usage check
    for chn in channels:
        xH5P , attr = load_fci_channel(chn,calibration)

endutra avatar Jun 03 '25 08:06 endutra

@endutra You've made some really good points. Some of the core maintainers (@mraspaud, @pnuu) and I just discussed this in our monthly meeting. We're hoping when @ameraner has time he can give a more in-depth answer.

Overall we are surprised by the small amount of memory usage you were able to obtain using pure h5py. At the same time though, the high memory usage of the Satpy reader is likely due to a lot of caching being done. The loading/opening of all the FCI files and then caching some of these fields causes the creation of the Scene to be very slow, but it is supposed to make loading multiple channels much faster than it would be otherwise. Your single channel loading with Satpy is avoiding (and being punished by) this caching so that portion of the comparison may not be "fair".

Another thing to consider that I'm just realizing is that Satpy is using dask. Depending on your configuration of dask (number of works) and the chunk size Satpy should use for its dask arrays, you could be seeing a large memory usage while dask is computing things in parallel (multiple threads). This is theoretically supposed to make things faster, but it really depends how things are configured and how your system handles it. The Satpy FAQ may be able to help with some of these issues: https://satpy.readthedocs.io/en/stable/faq.html. Could you provide us with more information on your system? Is it your local laptop/desktop? Number of logical or physical cores? Total memory?

A lot of work has been done to make the reading of FCI files as fast as possible, but I'm sure there could be improvements. For example, @mraspaud brought up that he played with using the new "datatree" support in Xarray and had similar performance to what is implemented currently with much less complexity in the code. If using h5py over NetCDF4 is also an improvement then that would be great to have someone like you look at making improvements to Satpy's reader.

We didn't have a good explanation for the segementation faults you were seeing other than it isn't completely unexpected for repeatedly reading the same files in the way they're being read in Satpy to cause some weird issues. There is a lot of complexity in the NetCDF4 Python and C library that could be causing issues here.

djhoese avatar Jun 03 '25 15:06 djhoese

@djhoese thanks for the feedback. I'm testing this on a linux server with 26CPUs & 192Gb RAM (shared with others...) and reading from a fast disk. I tried DASK_NUM_WORKERS= 1 and dask.config.set(num_workers=1) with no impact.

I initially tested netCDF4, and it was also very slow opening FCI files (about 10x slower when compared with h5py). It seems that netCDF4 library loads all metadata, and FCI files have lot's of metadata, while h5py seems to be lazy, loading just what we ask for (this was not tested in detail...).

Your point on the caching is correct, loading a single channel with satpy is not optimal. In fact when loading 4 channels in one single call it takes only 12.5 seconds (while only channel takes about 8sec). My main concern was not the runtime, but the memory usage. I looked in more detail to this, after seeing a large >20Gb RAM usage when loading several FCI high resolution channels (500m and 1km), which would put some pressure on near real time production. The simple reader using h5py does not have all the possibilities of satpy, but allows to control in detail memory usage. Independently of this, satpy is fantastic :)

endutra avatar Jun 03 '25 17:06 endutra

I'll be honest that out of the maintainers of Satpy I'm probably the least familiar with FCI's files so that's an interesting point on the NetCDF4 versus h5py metadata loading. With 1 dask worker I wouldn't worry about this, but also NUM_OMP_THREADS=1 is important when you have a machine with many cores, but yeah with 1 dask worker it shouldn't be too big of a deal.

As far as memory usage, the other thing to keep in mind is that you're computing .values and loading the entire numpy array into memory. If using Satpy and writing to geotiff this is less of a problem as the chunks of data should be written and then thrown away. Whether this happens optimally or even matters when comparing .values satpy to h5py numpy arrays is a whole other question.

Let's see what others have to say. Maybe someone can try replacing the NetCDF-based reader in Satpy with an h5py reader and see how that changes things.

djhoese avatar Jun 03 '25 17:06 djhoese

One more thing is that satpy also does read other arrays when loading a single channel, for example the quality flags and indices, that are then used to fill bad data with NaN values. But I agree that using 10 times the memory that the channel values is supposed to take is probably a sign of something fishy happening.

mraspaud avatar Jun 04 '25 06:06 mraspaud

FWIW I have also noticed segfaults with FCI when I have a loop over timeslots. I also need to call the garbage collection to avoid these errors. Note: This is for Separate timeslots, not like the case here where the same timeslot is loaded several times.

simonrp84 avatar Jun 04 '25 10:06 simonrp84

Sorry for joining the discussion this late. Unfortunately I have not too much to add on top of what has been discussed already. But I have to slightly correct @mraspaud response: it is true that we "load" the channel pixel quality arrays automatically together with each channel, since they're a defined ancillary variable in the original NetCDF variable. However, we do not actually use that data to filter out bad pixels in the reader- so the arrays just sit daskified inside the ancillary_variables attribute, hence they shouldn't add too much (memory) overhead to the process. So yeah, the difference you're observing is suprising to me too!

On a side note, I'd still vote for removing the automated loading of the pixel quality, just to avoid the extra reading and daskification overhead, since the data is not actually used and loaded in 99.9% of the times. Users that want the pixel qualities can load them explicitly with e.g. ir_105_pixel_quality and apply them to the data.

ameraner avatar Jun 27 '25 15:06 ameraner