Error when using delayed writing using one area but several (writers) formats
Describe the bug
I get an error when trying to make AVHRR images using the delayed writing functionality (as is used in trollflow2). See the example code below. The original error was discovered when running trollflow2 with a the minimal productlist, see extract outlined below:
reader: avhrr_l1b_aapp
mask_area: true
delay_composites: true
use_tmp_file: true
radius_of_influence: 10000
metadata_aliases:
variant:
GDS: global
EARS: regional
DR: direct_readout
min_coverage: 25
areas:
euron1:
areaname: euron1
products:
overview:
min_sunlight_coverage: 10
productname: overview
formats:
- format: tif
writer: geotiff
blockxsize: 512
blockysize: 512
tiled: true
- format: png
writer: simple_image
- format: jpg
writer: simple_image
fill_value: 0
...
If comment out either the png and jpg and leave only the tiff, it works fine. If I comment out the tiff, and leave the png/jpeg it also works fine!
To Reproduce
I managed to reproduce it with this minimal code below:
# Your code here
scene = Scene(filenames=filenames, reader='avhrr_l1b_aapp')
parameters = ['overview']
scene.load(parameters)
AREAID = 'euron1'
local_scn = scene.resample(AREAID)
start_time_txt = local_scn.start_time.strftime('%Y-%m-%d %H:%M')
timestamp = local_scn.start_time.strftime('%Y%m%d_%H%M')
obj1 = local_scn.save_dataset('overview', writer='geotiff',
blockxsize=512,
blockysize=512,
tiled=True,
compute=False,
filename='avhrr_ovw_%s_%s.tif' % (timestamp,
AREAID))
obj2 = local_scn.save_dataset('overview', writer='simple_image', fill_value=0,
compute=False,
filename='avhrr_ovw_%s_%s.jpg' % (timestamp,
AREAID))
obj3 = local_scn.save_dataset('overview', writer='simple_image',
compute=False,
filename='avhrr_ovw_%s_%s.png' % (timestamp,
AREAID))
objs = [obj1, obj2, obj3]
compute_writer_results(objs)
Expected behavior I would expect three images being written, now I get only two, the png and the tiff.
Actual results Text output of actual results or error messages including full tracebacks if applicable.
Traceback (most recent call last):
File "/home/myuser/test_satpy_save_images.py", line 73, in <module>
compute_writer_results(objs)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/path/to/conda/env/lib/python3.13/site-packages/satpy/writers/core/compute.py", line 139, in compute_writer_results
da.compute(delayeds)
~~~~~~~~~~^^^^^^^^^^
File "/path/to/conda/env/lib/python3.13/site-packages/dask/base.py", line 681, in compute
results = schedule(expr, keys, **kwargs)
File "/path/to/conda/env/lib/python3.13/site-packages/pyresample/future/resamplers/nearest.py", line 62, in query_no_distance
distance_array, index_array = kdtree.query(
~~~~~~~~~~~~^
coords,
^^^^^^^
...<2 lines>...
distance_upper_bound=radius,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mask=mask)
^^^^^^^^^^
File "pykdtree/kdtree.pyx", line 274, in pykdtree.kdtree.KDTree.query
ValueError: Mask must have the same size as data points
Screenshots If applicable, add screenshots to help explain your problem.
Environment Info:
- OS: Linux (RHEL9)
- Satpy Version: 0.57.0
- Pyresample Version:
- bokeh: 3.8.0
- trollimage=1.26.0
- Readers and writers dependencies (when relevant): [run
from satpy.utils import check_satpy; check_satpy()]
See requirements file attached!
Additional context Add any other context about the problem here.
There is some randomness involved here! Sometimes the minimal script above actually runs without error. But most times it errors...
Try upgrading trollimage and let me know if you get the same results.
Same behaviour unfortunately using latest Trollimage (1.27.0) and latest Satpy (0.59)! That is with trollflow2 and the full product list at the moment only now though. Struggling now to make a small test case that will fail offline... So far can't...
Okay, right now I have this env, given by the requirements file attached below.
requirements-conda-avhrr-imagery-0.3.2.txt
And I can produce 6 images successfully with this (minimal) trollflow2 product list:
...
min_coverage: 25
areas:
euron1:
areaname: euron1
products:
overview:
min_sunlight_coverage: 10
productname: overview
formats:
- format: tif
writer: geotiff
blockxsize: 512
blockysize: 512
tiled: true
- format: png
writer: simple_image
- format: jpg
writer: simple_image
fill_value: 0
night_fog:
productname: night_fog
formats:
- format: tif
writer: geotiff
blockxsize: 512
blockysize: 512
tiled: true
fname_pattern: "{platform_name:s}_{start_time:%Y%m%d_%H%M}_{areaname:s}_night_fog.{format}"
- format: png
writer: simple_image
fname_pattern: "{platform_name:s}_{start_time:%Y%m%d_%H%M}_{areaname:s}_nightfog.{format}"
- format: jpg
writer: simple_image
fill_value: 0
fname_pattern: "{platform_name:s}_{start_time:%Y%m%d_%H%M}_{areaname:s}_nightfog.{format}"
But if fails with the full product list, as attached below:
This product list first failed, but running it later it ran fine. When I succeeded with it, I ran the original full list again, just to confirm that this still fails...
And it always fails with the same error (the pykdtree mask size error)?
My wild guess is that the various refactorings and removals of "dask delayed" functions has introduced a case where two separate calculations are producing a dask task with the same name. When dask goes to compute and use the results of that task it sometimes gets the expected result and sometimes gets the result for the other band's calculation.
And it always fails with the same error (the pykdtree mask size error)?
My wild guess is that the various refactorings and removals of "dask delayed" functions has introduced a case where two separate calculations are producing a dask task with the same name. When dask goes to compute and use the results of that task it sometimes gets the expected result and sometimes gets the result for the other band's calculation.
Sounds plausible! Yes, so far I have seen only this error everytime!
I see both a map_blocks and a delayed usage in https://github.com/pytroll/satpy/blob/main/satpy/readers/aapp_l1b.py. It may be worth printing out the delayed object .name and the map_block's array result's .name and see if they are ever equal to previous names. By that I mean you shouldn't compare the delayed name and the map_blocks name, but compare the delayed name for one band/geolocation result with the band/geolocation name of another band/geolocation array. Hopefully this makes sense...
Ok, guessing this:
265d264
< logger.debug(f"output_delayed name: {output_delayed.name}")
660c659
< mapblocks_obj = da.map_blocks(_ir_calibrate_numpy,
---
> return da.map_blocks(_ir_calibrate_numpy,
664,665c663
< logger.debug(f"mapblocks_obj name: {mapblocks_obj.name}")
< return mapblocks_obj
---
>
?
Yes and similarly for the delayed object before or after it is converted to a dask array (I think it has the same name before and after).
I have:
def _interpolate_arrays(self, *input_arrays, geolocation=False):
lines = input_arrays[0].shape[0]
try:
interpolator = self._create_40km_interpolator(lines, *input_arrays, geolocation=geolocation)
except ImportError:
logger.warning("Could not interpolate, python-geotiepoints missing.")
output_arrays = input_arrays
else:
output_delayed = delayed(interpolator.interpolate, nout=3)()
logger.debug(f"output_delayed name: {output_delayed.name}")
output_arrays = [da.from_delayed(out_array, (lines, 2048), in_array.dtype)
for in_array, out_array in zip(input_arrays, output_delayed)]
return output_arrays
And
...
bandcor_2 = radtempcnv[0, irchn, 1] / 1e5
bandcor_3 = radtempcnv[0, irchn, 2] / 1e6
mapblocks_obj = da.map_blocks(_ir_calibrate_numpy,
rad, mask, cwnum, bandcor_2, bandcor_3,
dtype=rad.dtype,
meta=np.ndarray((), dtype=rad.dtype))
logger.debug(f"mapblocks_obj name: {mapblocks_obj.name}")
return mapblocks_obj
Then I see this in the log: (the scene around 19:08 errored)
(base) [safusr.u@lxserv1096 satellit]$ grep "output_delayed" avhrr_imagery_processor.log
[2025-11-11 17:28:22,012 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-842ca8a43ce6e8da930705a9c95d1eb9')
[2025-11-11 17:28:22,093 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-ecb34356680160d7ba036aa06a01e06b')
[2025-11-11 18:24:19,554 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-15c8df91e8dd7f42226de795131190c6')
[2025-11-11 18:24:19,564 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-f056eae4065d95414fc243a0ee658af4')
[2025-11-11 19:08:45,932 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-d0bcd2dddc829e169fdeba8727794ba0')
[2025-11-11 19:08:45,943 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-443bce611f3cca78e06c0e364779bb53')
[2025-11-11 20:04:38,033 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-ee96fd34dfc9e200da0baa8139f4c6d4')
[2025-11-11 20:04:38,445 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-9c7fe50f72003b57c3c1358429f5970a')
[2025-11-11 20:48:55,036 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-46add410c22ea14440f66def117263e7')
[2025-11-11 20:48:55,045 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-b9d1b286f8069c7db11f14e020b62e9b')
[2025-11-11 21:43:44,433 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-c5bb329e7183522ef131a3538c93dfec')
[2025-11-11 21:43:44,505 DEBUG satpy.readers.aapp_l1b] output_delayed name: Delayed('getattr-51d8c76dab28a903b79383f79ca843eb')
(base) [safusr.u@lxserv1096 satellit]$ grep "mapblocks_obj name" avhrr_imagery_processor.log
[2025-11-11 17:28:22,237 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-1f3cc7fe62af2949f5f3c8464c1d3851
[2025-11-11 17:28:22,305 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-00e0c17662fd2145104c09c271a8873e
[2025-11-11 17:28:22,374 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-99f81273964352911616abbd2982e795
[2025-11-11 18:24:19,703 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-6e8611279f8e38b9e592fa34e5ff5670
[2025-11-11 18:24:19,772 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-85e1aed2268bc009c18a3d7d06699fe7
[2025-11-11 18:24:19,905 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-ad8e493f438c6ebd2577ce6dbdb8c4d2
[2025-11-11 19:08:46,028 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-822eae0e3f7f470fdb9f1f1ebbc7da8c
[2025-11-11 19:08:46,106 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-cdf026775f9886daf30531f726346545
[2025-11-11 19:08:46,264 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-040328b7613c07fc55a5d96f24e18065
[2025-11-11 20:04:38,113 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-ea4b4189603670fcc5db6357742e8bd3
[2025-11-11 20:04:38,278 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-1d495b4e8331190a4dd223a5f14bb9e4
[2025-11-11 20:04:38,356 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-06557651cb69983bffdbc323129da7ae
[2025-11-11 20:48:55,276 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-e287dc12881a6bb0381a88f1ddd12dca
[2025-11-11 20:48:55,352 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-d11141f198b03ac52ea69089f8ff0326
[2025-11-11 20:48:55,430 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-792622b719610e03fb3cb2abcd2b83fa
[2025-11-11 21:43:44,497 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-26e8e35346760bbc077c15916d4b2618
[2025-11-11 21:43:44,562 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-a751e8e453516979bbdd234ff911e6fa
[2025-11-11 21:43:44,737 DEBUG satpy.readers.aapp_l1b] mapblocks_obj name: _ir_calibrate_numpy-bb59dd29e1fb980adf3697939ec859b8
I can't really see anything fishy here, right?
And the execution where you got that output failed?
Yes, the attempt to create the products using the above "full" product list failed for the scene around 19:08 UTC. All the othere scenes were okay - all products created and saved. (Sorry for the slow/intermittent answers - inbetween meetings/travels - now in train back home from Stockholm).
No problem. I'm going to try to finish some other stuff today but I'm hoping I can try this myself, maybe with a different reader and see what I get.
Just for info (maybe not surprising?) I see that the "failed" files with names like tmp1776461ob50qzai
actually look okay, they are tiff files... and can be displayed nicely.
And the failure always seem to point at the "overview" RGB image:
%> grep "ERROR: " avhrr_imagery_processor.log | grep -v "Could not load dataset"
[ERROR: 2025-11-12 22:08:40 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-12 22:08:40 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-12 22:08:47 : trollflow2.launcher] Missing file: /san1/polar_out/direct_readout/imagery/Metop-C_20251112_2157_euron1_overview.tif
[ERROR: 2025-11-13 07:52:50 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 07:52:50 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 07:52:59 : trollflow2.launcher] Missing file: /san1/polar_out/direct_readout/imagery/Metop-B_20251113_0736_euron1_overview.tif
[ERROR: 2025-11-13 08:37:26 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 08:37:26 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 08:37:33 : trollflow2.launcher] Missing file: /san1/polar_out/direct_readout/imagery/Metop-C_20251113_0821_euron1_overview.tif
[ERROR: 2025-11-13 11:55:39 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 11:55:39 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 11:55:50 : trollflow2.launcher] Missing file: /san1/polar_out/direct_readout/imagery/Metop-C_20251113_1141_euron1_overview.tif
[ERROR: 2025-11-13 13:30:42 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 13:30:42 : trollflow2.launcher] Process crashed
[ERROR: 2025-11-13 13:30:47 : trollflow2.launcher] Missing file: /san1/polar_out/direct_readout/imagery/Metop-C_20251113_1321_euron1_overview.tif
But I am prety sure I could get it to fail also without that RGB, I think I tested that. But will continue...
Are you saying it is always overview that is the problem? This is AVHRR so all of the bands are the same resolution aren't they? So resampling isn't necessary to create the composites... @pnuu or @adybbroe is reduce_data=False set in trollflow2 or does it use the default?
It is errors like this where I really wish I would have put more information into the error messages (ex. the error from pykdtree in your original post).
@adybbroe do you have the ability and comfort to modify your pyresample installation in place to print/log more information if I can't reproduce this locally?
@adybbroe do you have the ability and comfort to modify your pyresample installation in place to print/log more information if I can't reproduce this locally?
Yes, absolutely! It is a conda-env created and deployed with Ansible, but we are at our development server, so I have full controll, and can change in place, and restart the process
Are you saying it is always
overviewthat is the problem? This is AVHRR so all of the bands are the same resolution aren't they? So resampling isn't necessary to create the composites... @pnuu or @adybbroe isreduce_data=Falseset in trollflow2 or does it use the default?
Yeah, all same resolution!
Ok so I've started trying to reproduce this and look at few possible suspects, but I'm getting very confused at how this specific error is being produced for this specific reader and this specific set of products. The rest of this comment is mostly me thinking "out loud":
The error you are getting is in the KDTree object (pykdtree) when it is being queried. The query method of the KDTree is being given a "mask" parameter which is essentially the fill mask of the input data. The error though is saying this mask is not the same size as the input coordinates that were passed to the KDTree when it was created. Note that both of these arrays are filtered by "valid input coordinates" so only include pixels where the coordinates have lon/lats (-180/180, -90/90).
Additionally, this is AVHRR data which should be all the same resolution for all bands. If the data is split into chunks then it could be that one chunk is getting mixed up with another chunk, but I've never seen two dask chunks be confused with each other. BUT your full product list also has delay_composites: True which means generate=False gets passed to Scene.load so only the pure band data should be resampled so really the composites should not be a problem here.
So, the variables that we may need to play with:
- Number of workers. Not something I'm particularly worried about or want to play with but it may increase or decrease the times when the error shows up.
- Chunk size. Since the bare bands are being resampled then I'd be curious if this is dependent on the number of chunks the inputs are being split into. Large swath would possibly mean more chunks, but you've also said @adybbroe that you see the same inputs sometimes fail, sometimes pass.
- Data going over the poles. Are failures only seen with data failing over the poles?
- Is geolocation different for the 3a/3b?
- Your full product list has 2 areas. Maybe the number of areas increases the number of KDTree queries increasing the changes for dask to mix some things up. However, there should still only be one KDTree for each input geolocation (SwathDefinition). It is unlikely that dask is mixing up the KDTree instances...I hope.
I need to come up with some things you can print out to give us more information.
By default Trolflow2 has reduce_data=True.
@adybbroe if you're comfortable doing it, could you add a log or print just above this line:
https://github.com/pytroll/pyresample/blob/1f4c7fbe3985be068b4130fac7898f71eb7b5987/pyresample/future/resamplers/nearest.py#L62
And print out valid_input_index.shape, mask.shape, and kdtree.n.
@pnuu Thanks. I'm realizing now that this is swath data so that reduce data shouldn't play a part.
And just for completenes, the default resampling values are
GLOBAL_RESAMPLER_DEFAULTS = {
"radius_of_influence": None,
"resampler": "nearest",
"reduce_data": True,
"cache_dir": None,
"mask_area": False,
"epsilon": 0.0,
}
All the AVHRR AAPP L1b channels share the same navigation data.
Yes, I will try add this debug @djhoese Concerning your items above:
- Data from our own antenna in Norrköping, and we only rarely cover the pole, and I think very rarely with AVHRR (it happens more likely with VIIRS) - but I will keep my eyes open for that!
- 3a and 3b have the same geo location
- I have a feeling the chance of failing increase with number of areas, but not backed by much testing though...
[2025-11-13 21:04:10,466 DEBUG pyresample.future.resamplers.nearest] Before KDTree query in nearest: (4298, 2048) (8802304,) 8325120
[2025-11-13 21:04:10,641 ERROR trollflow2.launcher] Process crashed
That is valid_input_index.shape, mask.shape, and kdtree.n
But before that:
[2025-11-13 21:04:03,094 DEBUG pyresample.future.resamplers.nearest] Before KDTree query in nearest: (4298, 2048) (8802304,) 8802304
[2025-11-13 21:04:03,302 DEBUG pyresample.future.resamplers.nearest] Before KDTree query in nearest: (4298, 2048) (8802304,) 8802304
[2025-11-13 21:04:03,694 DEBUG pyresample.future.resamplers.nearest] Before KDTree query in nearest: (4298, 2048) (8802304,) 8802304
[2025-11-13 21:04:05,027 DEBUG pyresample.future.resamplers.nearest] Before KDTree query in nearest: (4298, 2048) (8802304,) 8802304
[2025-11-13 21:04:05,334 INFO pyninjotiff.ninjotiff] Successfully created a NinJo tiff file: '/xxxxxxx/imagery/tmp1822051io0rqs_s'
Awesome! Any chance you can get me these data files?