datacube-core icon indicating copy to clipboard operation
datacube-core copied to clipboard

Using dask.distributed.Client to load using dc.load is failing with a mysterious Pickle error

Open alexgleith opened this issue 6 months ago • 2 comments

I'm doing a fairly simple load, like this:

data = dc.load(
    product="s2_l2a",
    measurements=["red", "green", "blue"],
    output_crs="EPSG:32750",
    resolution=10,
    time=("2025-03-01", "2025-03-14"),
    x=(float(bbox.left), float(bbox.right)),
    y=(float(bbox.bottom), float(bbox.top)),
    dask_chunks={},
    group_by="solar_day",
)

data

and it's failing with the error below. If I remove dask, and just load direct, it works fine. It works fine using dask without the dask.distributed.Client too.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py:60, in dumps(x, buffer_callback, protocol)
     59 try:
---> 60     result = pickle.dumps(x, **dump_kwargs)
     61 except Exception:

AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py:65, in dumps(x, buffer_callback, protocol)
     64 buffers.clear()
---> 65 pickler.dump(x)
     66 result = f.getvalue()

AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py:366, in serialize(x, serializers, on_error, context, iterate_collection)
    365 try:
--> 366     header, frames = dumps(x, context=context) if wants_context else dumps(x)
    367     header["serializer"] = name

File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py:78, in pickle_dumps(x, context)
     76     writeable.append(not f.readonly)
---> 78 frames[0] = pickle.dumps(
     79     x,
     80     buffer_callback=buffer_callback,
     81     protocol=context.get("pickle-protocol", None) if context else None,
     82 )
     83 header = {
     84     "serializer": "pickle",
     85     "writeable": tuple(writeable),
     86 }

File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py:77, in dumps(x, buffer_callback, protocol)
     76     buffers.clear()
---> 77     result = cloudpickle.dumps(x, **dump_kwargs)
     78 except Exception:

File /opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1537, in dumps(obj, protocol, buffer_callback)
   1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1537 cp.dump(obj)
   1538 return file.getvalue()

File /opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py:1303, in Pickler.dump(self, obj)
   1302 try:
-> 1303     return super().dump(obj)
   1304 except RuntimeError as e:

TypeError: cannot pickle 'weakref.ReferenceType' object

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 data = data.compute()

File /opt/homebrew/lib/python3.11/site-packages/xarray/core/dataset.py:714, in Dataset.compute(self, **kwargs)
    690 """Manually trigger loading and/or computation of this dataset's data
    691 from disk or a remote source into memory and return a new dataset.
    692 Unlike load, the original dataset is left unaltered.
   (...)    711 dask.compute
    712 """
    713 new = self.copy(deep=False)
--> 714 return new.load(**kwargs)

File /opt/homebrew/lib/python3.11/site-packages/xarray/core/dataset.py:541, in Dataset.load(self, **kwargs)
    538 chunkmanager = get_chunked_array_type(*lazy_data.values())
    540 # evaluate all the chunked arrays simultaneously
--> 541 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    542     *lazy_data.values(), **kwargs
    543 )
    545 for k, data in zip(lazy_data, evaluated_data, strict=False):
    546     self.variables[k].data = data

File /opt/homebrew/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:85, in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File /opt/homebrew/lib/python3.11/site-packages/dask/base.py:681, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    678     expr = expr.optimize()
    679     keys = list(flatten(expr.__dask_keys__()))
--> 681     results = schedule(expr, keys, **kwargs)
    683 return repack(results)

File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py:284, in serialize(x, serializers, on_error, context, iterate_collection)
    282     return x.header, x.frames
    283 if isinstance(x, Serialize):
--> 284     return serialize(
    285         x.data,
    286         serializers=serializers,
    287         on_error=on_error,
    288         context=context,
    289         iterate_collection=True,
    290     )
    292 # Note: don't use isinstance(), as it would match subclasses
    293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a
    294 # round-trip through msgpack
    295 if iterate_collection is None and type(x) in (list, set, tuple, dict):

File /opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py:391, in serialize(x, serializers, on_error, context, iterate_collection)
    389         str_x = str(x)[:10000]
    390     except Exception:
--> 391         raise TypeError(msg) from exc
    392     raise TypeError(msg, str_x) from exc
    393 else:  # pragma: nocover

TypeError: Could not serialize object of type _HLGExprSequence

Version of software are:

Datacube version: 1.9.2 Dask version: 2025.5.0 Dask distributed version: 2025.5.0

alexgleith avatar May 15 '25 06:05 alexgleith

I'm pretty sure this is one of two known Dask issues in 1.9.[0-2] - one was fixed in 1.9.3, the other is fixed in develop and will be included in the 1.9.4 release, which I will hopefully get out soonish.

SpacemanPaul avatar May 16 '25 01:05 SpacemanPaul

Almost certainly fixed by this: https://github.com/opendatacube/datacube-core/pull/1776 (which is not released yet)

SpacemanPaul avatar May 16 '25 01:05 SpacemanPaul

Fixed in 1.9.3

SpacemanPaul avatar Jun 27 '25 06:06 SpacemanPaul

This is still causing issues with a fairly simple load using Dask. Versions are as follows:

Python      : 3.11.11
datacube    : 1.9.6
dask        : 2025.7.0
distributed : 2025.7.0

Minimal reproducible example is like this (should work with any product):

from datacube import Datacube
from dask.distributed import Client


dc = Datacube(app="test")
client = Client(threads_per_worker=8, n_workers=2)

datasets = dc.find_datasets(
    product="ls9_c2l2_sr",
    limit=1
)

data = dc.load(
    datasets=datasets,
    measurements=["red"],
    output_crs="EPSG:4326",
    resolution=0.0001,
    dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
)

data.compute()

alexgleith avatar Jul 30 '25 02:07 alexgleith

Stack trace:

[/opt/homebrew/lib/python3.11/site-packages/distributed/node.py:187](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/distributed/node.py:187): UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 57294 instead
  warnings.warn(
Querying product Product(name='ls9_c2l2_sr', id_=1)
--- Logging error ---
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 80, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps
    cp.dump(obj)
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'weakref.ReferenceType' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 687, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 377, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
  File "/opt/homebrew/lib/python3.11/site-packages/dask/_expr.py", line 90, in __str__
    s = ", ".join(self._operands_for_repr())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/dask/_expr.py", line 1097, in _operands_for_repr
    f"name={self.operand('name')!r}",
            ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/dask/_expr.py", line 224, in operand
    return self.operands[type(self)._parameters.index(key)]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 'name' is not in list
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/traitlets/config/application.py", line 1075, in launch_instance
    app.start()
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/kernelapp.py", line 739, in start
    self.io_loop.start()
  File "/opt/homebrew/lib/python3.11/site-packages/tornado/platform/asyncio.py", line 205, in start
    self.asyncio_loop.run_forever()
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/kernelbase.py", line 545, in dispatch_queue
    await self.process_one()
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/kernelbase.py", line 534, in process_one
    await dispatch(*args)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/kernelbase.py", line 437, in dispatch_shell
    await result
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/ipkernel.py", line 362, in execute_request
    await super().execute_request(stream, ident, parent)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/kernelbase.py", line 778, in execute_request
    reply_content = await reply_content
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/ipkernel.py", line 449, in do_execute
    res = shell.run_cell(
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/ipykernel/zmqshell.py", line 549, in run_cell
    return super().run_cell(*args, **kwargs)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/interactiveshell.py", line 3047, in run_cell
    result = self._run_cell(
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/interactiveshell.py", line 3102, in _run_cell
    result = runner(coro)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/async_helpers.py", line 128, in _pseudo_sync_runner
    coro.send(None)
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/interactiveshell.py", line 3306, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/interactiveshell.py", line 3489, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "/Users/alex/Library/Python/3.11/lib/python/site-packages/IPython/core/interactiveshell.py", line 3549, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/0h/8_60bs_52tx9q2mdgszb37gw0000gn/T/ipykernel_79213/605807411.py", line 21, in <module>
    data.compute()
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/core/dataset.py", line 714, in compute
    return new.load(**kwargs)
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/core/dataset.py", line 541, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
  File "/opt/homebrew/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
  File "/opt/homebrew/lib/python3.11/site-packages/dask/base.py", line 681, in compute
    results = schedule(expr, keys, **kwargs)
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/client.py", line 3475, in get
    futures = self._graph_to_futures(
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/client.py", line 3365, in _graph_to_futures
    expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise"))
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 284, in serialize
    return serialize(
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 366, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 78, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 82, in dumps
    logger.exception("Failed to serialize %s.", x)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.

alexgleith avatar Jul 30 '25 02:07 alexgleith

Tried using older dask and distributed (2024.6.0) and get this error:

Querying product Product(name='ls9_c2l2_sr', id_=1)
2025-07-30 12:35:45,746 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x140f37090>
 0. 5381305280
>.
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1537, in dumps
    cp.dump(obj)
  File "/opt/homebrew/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'weakref.ReferenceType' object

alexgleith avatar Jul 30 '25 02:07 alexgleith

Reproduced in another environment with the following versions:

Python      : 3.12.10
datacube    : 1.9.6
dask        : 2025.5.1
distributed : 2025.5.1

Traceback:

[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/node.py:187](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/node.py#line=186): UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43519 instead
  warnings.warn(
Querying product Product(name='ls9_c2l2_sr', id_=1)
--- Logging error ---
Traceback (most recent call last):
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 60](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=59), in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 65](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=64), in dumps
    pickler.dump(x)
AttributeError: Can't get local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 77](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=76), in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1537](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py#line=1536), in dumps
    cp.dump(obj)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py", line 1303](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py#line=1302), in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'weakref.ReferenceType' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py", line 1160](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py#line=1159), in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py", line 999](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py#line=998), in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py", line 703](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py#line=702), in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py", line 392](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/logging/__init__.py#line=391), in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py", line 90](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py#line=89), in __str__
    s = ", ".join(self._operands_for_repr())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py", line 1094](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py#line=1093), in _operands_for_repr
    f"name={self.operand('name')!r}",
            ^^^^^^^^^^^^^^^^^^^^
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py", line 221](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/_expr.py#line=220), in operand
    return self.operands[type(self)._parameters.index(key)]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 'name' is not in list
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel_launcher.py", line 18](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel_launcher.py#line=17), in <module>
    app.launch_new_instance()
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/traitlets/config/application.py", line 1075](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/traitlets/config/application.py#line=1074), in launch_instance
    app.start()
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelapp.py", line 739](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelapp.py#line=738), in start
    self.io_loop.start()
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/tornado/platform/asyncio.py", line 211](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/tornado/platform/asyncio.py#line=210), in start
    self.asyncio_loop.run_forever()
  File "[/srv/conda/envs/notebook/lib/python3.12/asyncio/base_events.py", line 645](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/asyncio/base_events.py#line=644), in run_forever
    self._run_once()
  File "[/srv/conda/envs/notebook/lib/python3.12/asyncio/base_events.py", line 1999](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/asyncio/base_events.py#line=1998), in _run_once
    handle._run()
  File "[/srv/conda/envs/notebook/lib/python3.12/asyncio/events.py", line 88](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/asyncio/events.py#line=87), in _run
    self._context.run(self._callback, *self._args)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py", line 545](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py#line=544), in dispatch_queue
    await self.process_one()
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py", line 534](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py#line=533), in process_one
    await dispatch(*args)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py", line 437](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py#line=436), in dispatch_shell
    await result
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 362](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/ipkernel.py#line=361), in execute_request
    await super().execute_request(stream, ident, parent)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py", line 778](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/kernelbase.py#line=777), in execute_request
    reply_content = await reply_content
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/ipkernel.py", line 449](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/ipkernel.py#line=448), in do_execute
    res = shell.run_cell(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/zmqshell.py", line 549](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/ipykernel/zmqshell.py#line=548), in run_cell
    return super().run_cell(*args, **kwargs)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3046](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py#line=3045), in run_cell
    result = self._run_cell(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3101](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py#line=3100), in _run_cell
    result = runner(coro)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/async_helpers.py", line 129](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/async_helpers.py#line=128), in _pseudo_sync_runner
    coro.send(None)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3306](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py#line=3305), in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3488](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py#line=3487), in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3548](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py#line=3547), in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "[/tmp/ipykernel_110/901322381.py", line 21](https://sandbox.staging.pik-sel.id/tmp/ipykernel_110/901322381.py#line=20), in <module>
    data.compute()
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py", line 714](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=713), in compute
    return new.load(**kwargs)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py", line 541](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=540), in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 85](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py#line=84), in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/base.py", line 681](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/base.py#line=680), in compute
    results = schedule(expr, keys, **kwargs)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/client.py", line 3467](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/client.py#line=3466), in get
    futures = self._graph_to_futures(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/client.py", line 3357](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/client.py#line=3356), in _graph_to_futures
    expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise"))
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py", line 284](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=283), in serialize
    return serialize(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py", line 366](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=365), in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py", line 78](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=77), in pickle_dumps
    frames[0] = pickle.dumps(
  File "[/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py", line 79](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=78), in dumps
    logger.exception("Failed to serialize %s.", x)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py:60](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=59), in dumps(x, buffer_callback, protocol)
     59 try:
---> 60     result = pickle.dumps(x, **dump_kwargs)
     61 except Exception:

AttributeError: Can't get local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py:65](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=64), in dumps(x, buffer_callback, protocol)
     64 buffers.clear()
---> 65 pickler.dump(x)
     66 result = f.getvalue()

AttributeError: Can't get local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py:366](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=365), in serialize(x, serializers, on_error, context, iterate_collection)
    365 try:
--> 366     header, frames = dumps(x, context=context) if wants_context else dumps(x)
    367     header["serializer"] = name

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py:78](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=77), in pickle_dumps(x, context)
     76     writeable.append(not f.readonly)
---> 78 frames[0] = pickle.dumps(
     79     x,
     80     buffer_callback=buffer_callback,
     81     protocol=context.get("pickle-protocol", None) if context else None,
     82 )
     83 header = {
     84     "serializer": "pickle",
     85     "writeable": tuple(writeable),
     86 }

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py:77](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/pickle.py#line=76), in dumps(x, buffer_callback, protocol)
     76     buffers.clear()
---> 77     result = cloudpickle.dumps(x, **dump_kwargs)
     78 except Exception:

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1537](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py#line=1536), in dumps(obj, protocol, buffer_callback)
   1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1537 cp.dump(obj)
   1538 return file.getvalue()

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py:1303](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/cloudpickle/cloudpickle.py#line=1302), in Pickler.dump(self, obj)
   1302 try:
-> 1303     return super().dump(obj)
   1304 except RuntimeError as e:

TypeError: cannot pickle 'weakref.ReferenceType' object

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[1], line 21
      8 datasets = dc.find_datasets(
      9     product="ls9_c2l2_sr",
     10     limit=1
     11 )
     13 data = dc.load(
     14     datasets=datasets,
     15     measurements=["red"],
   (...)
     18     dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
     19 )
---> 21 data.compute()

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:714](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=713), in Dataset.compute(self, **kwargs)
    690 """Manually trigger loading and[/or](https://sandbox.staging.pik-sel.id/or) computation of this dataset's data
    691 from disk or a remote source into memory and return a new dataset.
    692 Unlike load, the original dataset is left unaltered.
   (...)
    711 dask.compute
    712 """
    713 new = self.copy(deep=False)
--> 714 return new.load(**kwargs)

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:541](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=540), in Dataset.load(self, **kwargs)
    538 chunkmanager = get_chunked_array_type(*lazy_data.values())
    540 # evaluate all the chunked arrays simultaneously
--> 541 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    542     *lazy_data.values(), **kwargs
    543 )
    545 for k, data in zip(lazy_data, evaluated_data, strict=False):
    546     self.variables[k].data = data

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py:85](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py#line=84), in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/base.py:681](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/dask/base.py#line=680), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    678     expr = expr.optimize()
    679     keys = list(flatten(expr.__dask_keys__()))
--> 681     results = schedule(expr, keys, **kwargs)
    683 return repack(results)

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py:284](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=283), in serialize(x, serializers, on_error, context, iterate_collection)
    282     return x.header, x.frames
    283 if isinstance(x, Serialize):
--> 284     return serialize(
    285         x.data,
    286         serializers=serializers,
    287         on_error=on_error,
    288         context=context,
    289         iterate_collection=True,
    290     )
    292 # Note: don't use isinstance(), as it would match subclasses
    293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a
    294 # round-trip through msgpack
    295 if iterate_collection is None and type(x) in (list, set, tuple, dict):

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py:391](https://sandbox.staging.pik-sel.id/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/protocol/serialize.py#line=390), in serialize(x, serializers, on_error, context, iterate_collection)
    389         str_x = str(x)[:10000]
    390     except Exception:
--> 391         raise TypeError(msg) from exc
    392     raise TypeError(msg, str_x) from exc
    393 else:  # pragma: nocover

TypeError: Could not serialize object of type _HLGExprSequence

alexgleith avatar Jul 30 '25 02:07 alexgleith

I can't reproduce on DEA Sandbox with:

Python      : 3.10.15
datacube    : 1.9.6
dask        : 2025.7.0
distributed : 2025.7.0

SpacemanPaul avatar Jul 30 '25 05:07 SpacemanPaul

Ok, trying with a new (old, 3.10) environment:

Python      : 3.10.12
datacube    : 1.9.6
dask        : 2025.7.0
distributed : 2025.7.0
xarray      : 2025.6.1
rasterio    : 1.4.3
--- Logging error ---
Traceback (most recent call last):
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 63](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=62), in dumps
    result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 68](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=67), in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 80](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=79), in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1537](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py#line=1536), in dumps
    cp.dump(obj)
  File "[/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py", line 1303](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py#line=1302), in dump
    return super().dump(obj)
TypeError: cannot pickle 'weakref.ReferenceType' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/usr/lib/python3.10/logging/__init__.py", line 1100](http://localhost:8888/lab/tree/usr/lib/python3.10/logging/__init__.py#line=1099), in emit
    msg = self.format(record)
  File "[/usr/lib/python3.10/logging/__init__.py", line 943](http://localhost:8888/lab/tree/usr/lib/python3.10/logging/__init__.py#line=942), in format
    return fmt.format(record)
  File "[/usr/lib/python3.10/logging/__init__.py", line 678](http://localhost:8888/lab/tree/usr/lib/python3.10/logging/__init__.py#line=677), in format
    record.message = record.getMessage()
  File "[/usr/lib/python3.10/logging/__init__.py", line 368](http://localhost:8888/lab/tree/usr/lib/python3.10/logging/__init__.py#line=367), in getMessage
    msg = msg % self.args
  File "[/usr/local/lib/python3.10/dist-packages/dask/_expr.py", line 90](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/dask/_expr.py#line=89), in __str__
    s = ", ".join(self._operands_for_repr())
  File "[/usr/local/lib/python3.10/dist-packages/dask/_expr.py", line 1097](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/dask/_expr.py#line=1096), in _operands_for_repr
    f"name={self.operand('name')!r}",
  File "[/usr/local/lib/python3.10/dist-packages/dask/_expr.py", line 224](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/dask/_expr.py#line=223), in operand
    return self.operands[type(self)._parameters.index(key)]
ValueError: 'name' is not in list
Call stack:
  File "[/usr/lib/python3.10/runpy.py", line 196](http://localhost:8888/lab/tree/usr/lib/python3.10/runpy.py#line=195), in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "[/usr/lib/python3.10/runpy.py", line 86](http://localhost:8888/lab/tree/usr/lib/python3.10/runpy.py#line=85), in _run_code
    exec(code, run_globals)
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel_launcher.py", line 18](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel_launcher.py#line=17), in <module>
    app.launch_new_instance()
  File "[/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py", line 1075](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py#line=1074), in launch_instance
    app.start()
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.py", line 739](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.py#line=738), in start
    self.io_loop.start()
  File "[/usr/local/lib/python3.10/dist-packages/tornado/platform/asyncio.py", line 211](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/tornado/platform/asyncio.py#line=210), in start
    self.asyncio_loop.run_forever()
  File "[/usr/lib/python3.10/asyncio/base_events.py", line 603](http://localhost:8888/lab/tree/usr/lib/python3.10/asyncio/base_events.py#line=602), in run_forever
    self._run_once()
  File "[/usr/lib/python3.10/asyncio/base_events.py", line 1909](http://localhost:8888/lab/tree/usr/lib/python3.10/asyncio/base_events.py#line=1908), in _run_once
    handle._run()
  File "[/usr/lib/python3.10/asyncio/events.py", line 80](http://localhost:8888/lab/tree/usr/lib/python3.10/asyncio/events.py#line=79), in _run
    self._context.run(self._callback, *self._args)
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 516](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py#line=515), in dispatch_queue
    await self.process_one()
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 505](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py#line=504), in process_one
    await dispatch(*args)
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 397](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py#line=396), in dispatch_shell
    await result
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py", line 368](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py#line=367), in execute_request
    await super().execute_request(stream, ident, parent)
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py", line 752](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py#line=751), in execute_request
    reply_content = await reply_content
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py", line 455](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py#line=454), in do_execute
    res = shell.run_cell(
  File "[/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py", line 577](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py#line=576), in run_cell
    return super().run_cell(*args, **kwargs)
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3077](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py#line=3076), in run_cell
    result = self._run_cell(
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3132](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py#line=3131), in _run_cell
    result = runner(coro)
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/async_helpers.py", line 128](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/async_helpers.py#line=127), in _pseudo_sync_runner
    coro.send(None)
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3336](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py#line=3335), in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3519](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py#line=3518), in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "[/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3579](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py#line=3578), in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_26/3616026081.py", line 21, in <module>
    data.compute()
  File "[/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 715](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py#line=714), in compute
    return new.load(**kwargs)
  File "[/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py", line 542](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py#line=541), in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
  File "[/usr/local/lib/python3.10/dist-packages/xarray/namedarray/daskmanager.py", line 85](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/namedarray/daskmanager.py#line=84), in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
  File "[/usr/local/lib/python3.10/dist-packages/dask/base.py", line 681](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/dask/base.py#line=680), in compute
    results = schedule(expr, keys, **kwargs)
  File "[/usr/local/lib/python3.10/dist-packages/distributed/client.py", line 3475](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/client.py#line=3474), in get
    futures = self._graph_to_futures(
  File "[/usr/local/lib/python3.10/dist-packages/distributed/client.py", line 3365](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/client.py#line=3364), in _graph_to_futures
    expr_ser = Serialized(*serialize(to_serialize(expr), on_error="raise"))
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 284](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=283), in serialize
    return serialize(
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 366](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=365), in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py", line 78](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=77), in pickle_dumps
    frames[0] = pickle.dumps(
  File "[/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py", line 82](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=81), in dumps
    logger.exception("Failed to serialize %s.", x)
Unable to print the message and arguments - possible formatting error.
Use the traceback above to help find the error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py:63](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=62), in dumps(x, buffer_callback, protocol)
     62 try:
---> 63     result = pickle.dumps(x, **dump_kwargs)
     64 except Exception:

AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py:68](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=67), in dumps(x, buffer_callback, protocol)
     67 buffers.clear()
---> 68 pickler.dump(x)
     69 result = f.getvalue()

AttributeError: Can't pickle local object 'Mapper.__init__.<locals>.<lambda>'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py:366](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=365), in serialize(x, serializers, on_error, context, iterate_collection)
    365 try:
--> 366     header, frames = dumps(x, context=context) if wants_context else dumps(x)
    367     header["serializer"] = name

File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py:78](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=77), in pickle_dumps(x, context)
     76     writeable.append(not f.readonly)
---> 78 frames[0] = pickle.dumps(
     79     x,
     80     buffer_callback=buffer_callback,
     81     protocol=context.get("pickle-protocol", None) if context else None,
     82 )
     83 header = {
     84     "serializer": "pickle",
     85     "writeable": tuple(writeable),
     86 }

File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py:80](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/pickle.py#line=79), in dumps(x, buffer_callback, protocol)
     79     buffers.clear()
---> 80     result = cloudpickle.dumps(x, **dump_kwargs)
     81 except Exception:

File [/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py:1537](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py#line=1536), in dumps(obj, protocol, buffer_callback)
   1536 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1537 cp.dump(obj)
   1538 return file.getvalue()

File [/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py:1303](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py#line=1302), in Pickler.dump(self, obj)
   1302 try:
-> 1303     return super().dump(obj)
   1304 except RuntimeError as e:

TypeError: cannot pickle 'weakref.ReferenceType' object

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[9], line 21
      8 datasets = dc.find_datasets(
      9     product="s2_l2a",
     10     limit=1
     11 )
     13 data = dc.load(
     14     datasets=datasets,
     15     measurements=["red"],
   (...)
     18     dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
     19 )
---> 21 data.compute()

File [/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py:715](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py#line=714), in Dataset.compute(self, **kwargs)
    691 """Manually trigger loading and/or computation of this dataset's data
    692 from disk or a remote source into memory and return a new dataset.
    693 Unlike load, the original dataset is left unaltered.
   (...)
    712 dask.compute
    713 """
    714 new = self.copy(deep=False)
--> 715 return new.load(**kwargs)

File [/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py:542](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/core/dataset.py#line=541), in Dataset.load(self, **kwargs)
    539 chunkmanager = get_chunked_array_type(*lazy_data.values())
    541 # evaluate all the chunked arrays simultaneously
--> 542 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    543     *lazy_data.values(), **kwargs
    544 )
    546 for k, data in zip(lazy_data, evaluated_data, strict=False):
    547     self.variables[k].data = data

File [/usr/local/lib/python3.10/dist-packages/xarray/namedarray/daskmanager.py:85](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/xarray/namedarray/daskmanager.py#line=84), in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File [/usr/local/lib/python3.10/dist-packages/dask/base.py:681](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/dask/base.py#line=680), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    678     expr = expr.optimize()
    679     keys = list(flatten(expr.__dask_keys__()))
--> 681     results = schedule(expr, keys, **kwargs)
    683 return repack(results)

File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py:284](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=283), in serialize(x, serializers, on_error, context, iterate_collection)
    282     return x.header, x.frames
    283 if isinstance(x, Serialize):
--> 284     return serialize(
    285         x.data,
    286         serializers=serializers,
    287         on_error=on_error,
    288         context=context,
    289         iterate_collection=True,
    290     )
    292 # Note: don't use isinstance(), as it would match subclasses
    293 # (e.g. namedtuple, defaultdict) which however would revert to the base class on a
    294 # round-trip through msgpack
    295 if iterate_collection is None and type(x) in (list, set, tuple, dict):

File [/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py:391](http://localhost:8888/lab/tree/usr/local/lib/python3.10/dist-packages/distributed/protocol/serialize.py#line=390), in serialize(x, serializers, on_error, context, iterate_collection)
    389         str_x = str(x)[:10000]
    390     except Exception:
--> 391         raise TypeError(msg) from exc
    392     raise TypeError(msg, str_x) from exc
    393 else:  # pragma: nocover

TypeError: Could not serialize object of type _HLGExprSequence

alexgleith avatar Jul 30 '25 06:07 alexgleith

Loading with odc-stac works fine

from pystac.client import Client as StacClient
from odc.stac import load
from dask.distributed import Client
from planetary_computer import sign_url

catalog = StacClient.open("https://planetarycomputer.microsoft.com/api/stac/v1")
items = catalog.search(
    collections=["landsat-c2-l2"],
    max_items=1
).item_collection()

client = Client(processes=False)

data = load(
    items=items,
    measurements=["red"],
    output_crs="EPSG:4326",
    resolution=0.0001,
    dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
    patch_url=sign_url,
)

data.compute()

alexgleith avatar Jul 30 '25 07:07 alexgleith

Loading using the driver="rio" argument for odc-load works. So this is a viable workaround for now, but it would be good to resolve the underlying issue.

from datacube import Datacube
from dask.distributed import Client
import os
from odc.stac import configure_s3_access

os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

if "AWS_NO_SIGN_REQUEST" in os.environ:
    del os.environ["AWS_NO_SIGN_REQUEST"]

configure_s3_access(requester_pays=True)

dc = Datacube(app="test")
client = Client(processes=False)

datasets = dc.find_datasets(
    product="ls9_c2l2_sr",
    limit=1
)

data = dc.load(
    datasets=datasets,
    measurements=["red"],
    output_crs="EPSG:4326",
    resolution=0.0001,
    dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
    driver="rio"
)

data.compute()

alexgleith avatar Jul 30 '25 10:07 alexgleith

Ok, so a final test. Same environment as before, this time using the postgres index driver, rather than postgis, and this works (note, no driver="rio"):

from datacube import Datacube
from dask.distributed import Client
import os
from datacube.utils.aws import configure_s3_access

os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

if "AWS_NO_SIGN_REQUEST" in os.environ:
    del os.environ["AWS_NO_SIGN_REQUEST"]

configure_s3_access(requester_pays=True)

dc = Datacube(app="test")
client = Client(processes=False)

datasets = dc.find_datasets(
    product="ls9_c2l2_sr",
    limit=1
)

data = dc.load(
    datasets=datasets,
    measurements=["red"],
    output_crs="EPSG:4326",
    resolution=0.0001,
    dask_chunks={"time": 1, "longitude": 1000, "latitude": 1000},
)

data.compute()

alexgleith avatar Jul 30 '25 11:07 alexgleith

I'm not sure how to read the backtraces, but one of the early ones include Mapper, and some lambda. I changed things around with the mapper in April, and I'm not surprised that lambdas can't be pickled.

Has this worked with the postgis driver on some older 1.9 release for you, so it is a regression?

pjonsson avatar Jul 30 '25 11:07 pjonsson

I changed things around with the mapper in April

What did you change?

Has this worked with the postgis driver on some older 1.9 release for you

I don't know, to be honest. But I'm now 99% sure it's something to do with the PostGIS Index Driver, which @SpacemanPaul says doesn't make any sense at all!

alexgleith avatar Jul 30 '25 22:07 alexgleith

You seem to be able to reproduce it reliably across multiple environments, so there must be something going on either with your setups or ODC. And keep in mind that I don't know anything about this issue/area, I only saw a keyword I recognized and commented.

The change I was thinking about is #1831, but I also did #1801 to get better type checking. It's possible that your problem was present before my change though, that was why I asked if you had been able to do this before.

As a general comment, when having (sketch, it's midnight and I don't know Python pools):

failing = factory(x)
pool.submit(f, failing)

where failing can't be pickled, it might be possible to put that call inside the body of f instead to avoid the problematic pickling.

pjonsson avatar Jul 30 '25 22:07 pjonsson

I don't know what it is that can't be picked still!

In summary, here's what I've tested:

  • Many combinations of dask/xarray, nothing changes
  • Using the rio vs default driver as an arguent to dc.load(), the rio loader works
  • Using the postgres vs postgis index, and postgres works for both rio and default drivers

So, in short, the non-rio driver fails to pickle something when the index backend is postgis.

alexgleith avatar Jul 30 '25 22:07 alexgleith

This PR includes a bunch of Mapped data structures: https://github.com/opendatacube/datacube-core/pull/1801/files

I don't know if that's any relation to the 'Mapper.__init__.<locals>.<lambda>' in the above stack traces?

alexgleith avatar Jul 30 '25 22:07 alexgleith

I know that the Mapper comes from SQLAlchemy (https://docs.sqlalchemy.org/en/20/orm/mapper_config.html), I don't know exactly where it resides in a running application.

We're talking about databases and distributing datacube-core though, and I know very little about both those subjects, so someone who knows more will have to chime in on this one.

pjonsson avatar Jul 30 '25 22:07 pjonsson