distributed icon indicating copy to clipboard operation
distributed copied to clipboard

distributed zarr write fail - OSError: too many open files; P2PConsistencyError: No active shuffle with

Open lbesnard opened this issue 1 month ago • 4 comments

Describe the issue: I'm processing NetCDF and converting them to Zarr with xarray. For this, I'm using a coiled cluster, dask, xarray, s3fs...

As a normal user who just wants to process data, I'm ending up with random dask behaviour; sometimes, the processing works (rarely), but most of the time it fails with various race conditions. I change vm_types, nthreads, max_pool_connections... heaps of things which don't lead to any sort of success and the log errors I get are all but useful.

Minimal Complete Verifiable Example:

    "coiled_cluster_options": {
      "n_workers": [
        20,
        100
      ],
      "scheduler_vm_types": "m7i-flex.large",
      "worker_vm_types": "m7i-flex.xlarge",
      "allow_ingress_from": "me",
      "compute_purchase_option": "spot_with_fallback",
      "worker_options": {
        "nthreads": 2,
      }
    },

I'm either using p2p, or tasks, but end up with the same behaviour


        dask.config.set(
            {
                "array.slicing.split_large_chunks": False,
                "distributed.scheduler.worker-saturation": "inf",
                "dataframe.shuffle.method": "p2p",
            }
        )

and my distributed file

  scheduler:
    work-stealing: False
    allowed-failures: 1 # fail fast
  worker:
    memory:
      spill: False
      pause: False
      terminate: False

Also tried to change the spill, pause, terminate to values such as .90 without any improvements.

The log outputs I get are not human readable,

\x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x1dunpickle_exception_with_attrs\x94\x93\x94(\x8c\x08builtins\x94\x8c\x0cRuntimeError\x94\x93\x94}\x94(\x8c\x08__dict__\x94}\x94\x8c\x04args\x94\x8c\xfaError during deserialization of the task graph. This frequently\noccurs if the Scheduler and Client have different environments.\nFor more information, see\nhttps://docs.dask.org/en/stable/deployment-considerations.html#consistent-software-environments\n\x94\x85\x94uh\x00\x8c\x12unpickle_exception\x94\x93\x94(\x8c\x13botocore.exceptions\x94\x8c\x1b_exception_from_packed_args\x94\x93\x94h\x0e\x8c\x17EndpointConnectionError\x94\x93\x94N}\x94(\x8c\x0cendpoint_url\x94\x8c\x90https://imos-data.s3.ap-southeast-2.amazonaws.com/IMOS/SRS/SST/ghrsst/L3SM-1d/dn/2012/20120430092000-ABOM-L3S_GHRSST-SSTfnd-MultiSensor-1d_dn.nc\x94\x8c\x05error\x94h\x02(\x8c\x19aiohttp.client_exceptions\x94\x8c\x17ClientConnectorDNSError\x94\x93\x94}\x94(h\x07}\x94(\x8c\t_conn_key\x94\x8c\x15aiohttp.client_reqrep\x94\x8c\rConnectionKey\x94\x93\x94(\x8c)imos-data.s3.ap-southeast-2.amazonaws.com\x94M\xbb\x01\x88\x88NNNt\x94\x81\x94\x8c\t_os_error\x94h\r(h\x03\x8c\x07OSError\x94\x93\x94K\x18\x8c\x13Too many open files\x94\x86\x94Nh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\x08f_locals\x94}\x94\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c\x11aiohttp.connector\x94\x8c\x08__file__\x94\x8cA/opt/coiled/env/lib/python3.12/site-packages/aiohttp/connector.py\x94u\x8c\x06f_code\x94h*\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h6\x8c\x07co_name\x94\x8c\x19_create_direct_connection\x94\x8c\x0bco_argcount\x94K\x00\x8c\x11co_kwonlyargcount\x94K\x00\x8c\x0bco_varnames\x94)\x8c\nco_nlocals\x94K\x00\x8c\x0cco_stacksize\x94K\x00\x8c\x08co_flags\x94K@\x8c\x0eco_firstlineno\x94K\x00ub\x8c\x08f_lineno\x94M\x02\x06ubM\xfc\x05h*\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\r_resolve_host\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM}\x04ub\x8c\ttb_lineno\x94M|\x04\x8c\x07tb_next\x94hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x1b_resolve_host_with_throttle\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xab\x04ubhSM\x9b\x04hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x10aiohttp.resolver\x94h5\x8c@/opt/coiled/env/lib/python3.12/site-packages/aiohttp/resolver.py\x94uh7h9)\x81\x94}\x94(h<heh=\x8c\x07resolve\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK(ubhSK(hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x13asyncio.base_events\x94h5\x8c5/opt/coiled/env/lib/python3.12/asyncio/base_events.py\x94uh7h9)\x81\x94}\x94(h<hph=\x8c\x0bgetaddrinfo\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x89\x03ubhSM\x89\x03hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x19concurrent.futures.thread\x94h5\x8c;/opt/coiled/env/lib/python3.12/concurrent/futures/thread.py\x94uh7h9)\x81\x94}\x94(h<h{h=\x8c\x03run\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK?ubhSK;hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x06socket\x94h5\x8c(/opt/coiled/env/lib/python3.12/socket.py\x94uh7h9)\x81\x94}\x94(h<h\x86h=hsh?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xd2\x03ubhSM\xd2\x03ubububububub\x87\x94R\x94N\x89Nt\x94R\x94uh\th"h\x8c\x86\x94\x8c\x05errno\x94K\x18\x8c\x08strerror\x94h&uh\x8ch)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x17aiobotocore.httpsession\x94h5\x8cG/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/httpsession.py\x94uh7h9)\x81\x94}\x94(h<h\x95h=\x8c\x04send\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x16\x01ubK\xe0hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0eaiohttp.client\x94h5\x8c>/opt/coiled/env/lib/python3.12/site-packages/aiohttp/client.py\x94uh7h9)\x81\x94}\x94(h<h\xa0h=\x8c\x08_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xa6\x03ubhSM\x0b\x03hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\x9fh5h\xa0uh7h9)\x81\x94}\x94(h<h\xa0h=\x8c\x19_connect_and_send_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xe1\x02ubhSM\xde\x02hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x07connect\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x88\x02ubhSM\x82\x02hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x12_create_connection\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x04ubhSM\xb9\x04hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=h>h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x02\x06ubhSM\x02\x06ububububub\x87\x94R\x94h\x8c\x88N)t\x94R\x94h\x1bbu\x87\x94Nh)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x15distributed.scheduler\x94h5\x8cE/opt/coiled/env/lib/python3.12/site-packages/distributed/scheduler.py\x94uh7h9)\x81\x94}\x94(h<h\xd1h=\x8c\x0cupdate_graph\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMt\x13ubM\n\x13hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1edistributed.protocol.serialize\x94h5\x8cN/opt/coiled/env/lib/python3.12/site-packages/distributed/protocol/serialize.py\x94uh7h9)\x81\x94}\x94(h<h\xdch=\x8c\x0bdeserialize\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xc4\x01ubhSM\xc4\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xdbh5h\xdcuh7h9)\x81\x94}\x94(h<h\xdch=\x8c\x0cpickle_loads\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKoubhSKohThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1bdistributed.protocol.pickle\x94h5\x8cK/opt/coiled/env/lib/python3.12/site-packages/distributed/protocol/pickle.py\x94uh7h9)\x81\x94}\x94(h<h\xf0h=\x8c\x05loads\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKbubhSK]hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1cxarray.backends.file_manager\x94h5\x8cL/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/file_manager.py\x94uh7h9)\x81\x94}\x94(h<h\xfbh=\x8c\x0c__setstate__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x17\x01ubhSM\x17\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh=\x8c\x08__init__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x94ubhSK\x94hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh=\x8c\t_make_key\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xa7ubhSK\xa7hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh=j\x07\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMM\x01ubhSMM\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0bfsspec.spec\x94h5\x8c;/opt/coiled/env/lib/python3.12/site-packages/fsspec/spec.py\x94uh7h9)\x81\x94}\x94(h<j \x01\x00\x00h=\x8c\x08__hash__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x9f\x07ubhSM\x9f\x07hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x1f\x01\x00\x00h5j \x01\x00\x00uh7h9)\x81\x94}\x94(h<j \x01\x00\x00h=\x8c\x07details\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x85\x07ubhSM\x85\x07hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0bfsspec.asyn\x94h5\x8c;/opt/coiled/env/lib/python3.12/site-packages/fsspec/asyn.py\x94uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x07wrapper\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKvubhSKvhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j3\x01\x00\x00h5j4\x01\x00\x00uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x04sync\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKgubhSKghThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j3\x01\x00\x00h5j4\x01\x00\x00uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x07_runner\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK<ubhSK8hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\ts3fs.core\x94h5\x8c9/opt/coiled/env/lib/python3.12/site-packages/s3fs/core.py\x94uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=\x8c\x05_info\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x05ubhSM\xa5\x05hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=\x8c\x08_call_s3\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMs\x01ubhSMs\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=\x8c\x0e_error_wrapper\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x92ubhSK\x92hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=jf\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x92ubhSKrhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x13aiobotocore.context\x94h5\x8cC/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/context.py\x94uh7h9)\x81\x94}\x94(h<jv\x01\x00\x00h=j7\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK$ubhSK$hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x12aiobotocore.client\x94h5\x8cB/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/client.py\x94uh7h9)\x81\x94}\x94(h<j\x80\x01\x00\x00h=\x8c\x0e_make_api_call\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x96\x01ubhSM\x96\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x7f\x01\x00\x00h5j\x80\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x80\x01\x00\x00h=\x8c\r_make_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x01ubhSM\xb0\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x14aiobotocore.endpoint\x94h5\x8cD/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/endpoint.py\x94uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\r_send_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKxubhSKxhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x0c_needs_retry\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x18\x01ubhSM\x18\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x11aiobotocore.hooks\x94h5\x8cA/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/hooks.py\x94uh7h9)\x81\x94}\x94(h<j\xa8\x01\x00\x00h=\x8c\x05_emit\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKDubhSKDhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x14aiobotocore._helpers\x94h5\x8cD/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/_helpers.py\x94uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=\x8c\x11resolve_awaitable\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x18aiobotocore.retryhandler\x94h5\x8cH/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/retryhandler.py\x94uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=\x8c\x05_call\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKkubhSKkhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xb2\x01\x00\x00h5j\xb3\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=j\xb6\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=j\xc1\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK~ubhSK~hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=\x8c\r_should_retry\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xa5ubhSK\xa5hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xb2\x01\x00\x00h5j\xb3\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=j\xb6\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=j\xc1\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xaeubhSK\xaehThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x15botocore.retryhandler\x94h5\x8cE/opt/coiled/env/lib/python3.12/site-packages/botocore/retryhandler.py\x94uh7h9)\x81\x94}\x94(h<j\xf2\x01\x00\x00h=\x8c\x08__call__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xf7ubhSK\xf7hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xf1\x01\x00\x00h5j\xf2\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xf2\x01\x00\x00h=\x8c\x17_check_caught_exception\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xa0\x01ubhSM\xa0\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x10_do_get_response\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xd0ubhSK\xc9hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x05_send\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM/\x01ubhSM/\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\x94h5h\x95uh7h9)\x81\x94}\x94(h<h\x95h=h\x98h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x16\x01ubhSM\x16\x01ubububububububububububububububububububububububububububububububububub\x87\x94R\x94h\xca\x89Nt\x94R\x94h)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xd0h5h\xd1uh7h9)\x81\x94}\x94(h<h\xd1h=h\xd4h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMt\x13ubM\x13\x13N\x87\x94R\x94j\x1c\x02\x00\x00\x88N)t\x94R\x94.'.                                                                                                                                                                                                                                                                                   Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                                                                                          File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1007, in publish_cloud_optimised_fileset_batch                                                                                                                                                                                                                                                                                                                                                     self._write_ds(ds, idx)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1786, in _write_ds                                                                                                                                                                                                                                                                                                                                                                                 self._append_zarr_store(ds)                                                                                                                                                                                                                                                                                                                                                                                                                                                                              File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1840, in _append_zarr_store                                                                                                                                                                                                                                                                                                                                                                        ds.to_zarr(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/core/dataset.py", line 2292, in to_zarr                                                                                                                                                                                                                                                                                                                                                                            return to_zarr(  # type: ignore[call-overload,miscile "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/backends/api.py", line 2246, in to_zarr                                                                                                                                                                                                                                                                                                                                                                            writes = writer.syncile "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/backends/common.py", line 357, in sync                                                                                                                                                                                                                                                                                                                                                                             delayed_store = chunkmanager.storeile "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 247, in store                                                                                                                                                                                                                                                                                                                                                                     return storeile "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/dask/array/core.py", line 1221, in store                                                                                                                                                                                                                                                                                                                                                                                  dask.compute(arrays, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                           File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/dask/base.py", line 681, in compute                                                                                                                                                                                                                                                                                                                                                                                       results = schedule(expr, keys, **kwargsile "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/distributed/client.py", line 2416, in _gather                                                                                                                                                                                                                                                                                                                                                                             raise exception.with_traceback(traceback)                                                                                                                                                                                                                                                                                                                                                                                                                                                              Exception: b'\x80\x05\x95\xed \x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x1dunpickle_exception_with_attrs\x94\x93\x94(\x8c\x08builtins\x94\x8c\x0cRuntimeError\x94\x93\x94}\x94(\x8c\x08__dict__\x94}\x94\x8c\x04args\x94\x8c\xfaError during deserialization of the task graph. This frequently\noccurs if the Scheduler and Client have different environments.\nFor more information, see\nhttps://docs.dask.org/en/stable/deployment-considerations.html#consistent-software-environments\n\x94\x85\x94uh\x00\x8c\x12unpickle_exception\x94\x93\x94(\x8c\x13botocore.exceptions\x94\x8c\x1b_exception_from_packed_args\x94\x93\x94h\x0e\x8c\x17EndpointConnectionError\x94\x93\x94N}\x94(\x8c\x0cendpoint_url\x94\x8c\x90https://imos-data.s3.ap-southeast-2.amazonaws.com/IMOS/SRS/SST/ghrsst/L3SM-1d/dn/2012/20120430092000-ABOM-L3S_GHRSST-SSTfnd-MultiSensor-1d_dn.nc\x94\x8c\x05error\x94h\x02(\x8c\x19aiohttp.client_exceptions\x94\x8c\x17ClientConnectorDNSError\x94\x93\x94}\x94(h\x07}\x94(\x8c\t_conn_key\x94\x8c\x15aiohttp.client_reqrep\x94\x8c\rConnectionKey\x94\x93\x94(\x8c)imos-data.s3.ap-southeast-2.amazonaws.com\x94M\xbb\x01\x88\x88NNNt\x94\x81\x94\x8c\t_os_error\x94h\r(h\x03\x8c\x07OSError\x94\x93\x94K\x18\x8c\x13Too many open files\x94\x86\x94Nh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\x08f_locals\x94}\x94\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c\x11aiohttp.connector\x94\x8c\x08__file__\x94\x8cA/opt/coiled/env/lib/python3.12/site-packages/aiohttp/connector.py\x94u\x8c\x06f_code\x94h*\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h6\x8c\x07co_name\x94\x8c\x19_create_direct_connection\x94\x8c\x0bco_argcount\x94K\x00\x8c\x11co_kwonlyargcount\x94K\x00\x8c\x0bco_varnames\x94)\x8c\nco_nlocals\x94K\x00\x8c\x0cco_stacksize\x94K\x00\x8c\x08co_flags\x94K@\x8c\x0eco_firstlineno\x94K\x00ub\x8c\x08f_lineno\x94M\x02\x06ubM\xfc\x05h*\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\r_resolve_host\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM}\x04ub\x8c\ttb_lineno\x94M|\x04\x8c\x07tb_next\x94hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x1b_resolve_host_with_throttle\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xab\x04ubhSM\x9b\x04hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x10aiohttp.resolver\x94h5\x8c@/opt/coiled/env/lib/python3.12/site-packages/aiohttp/resolver.py\x94uh7h9)\x81\x94}\x94(h<heh=\x8c\x07resolve\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK(ubhSK(hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x13asyncio.base_events\x94h5\x8c5/opt/coiled/env/lib/python3.12/asyncio/base_events.py\x94uh7h9)\x81\x94}\x94(h<hph=\x8c\x0bgetaddrinfo\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x89\x03ubhSM\x89\x03hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x19concurrent.futures.thread\x94h5\x8c;/opt/coiled/env/lib/python3.12/concurrent/futures/thread.py\x94uh7h9)\x81\x94}\x94(h<h{h=\x8c\x03run\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK?ubhSK;hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x06socket\x94h5\x8c(/opt/coiled/env/lib/python3.12/socket.py\x94uh7h9)\x81\x94}\x94(h<h\x86h=hsh?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xd2\x03ubhSM\xd2\x03ubububububub\x87\x94R\x94N\x89Nt\x94R\x94uh\th"h\x8c\x86\x94\x8c\x05errno\x94K\x18\x8c\x08strerror\x94h&uh\x8ch)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x17aiobotocore.httpsession\x94h5\x8cG/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/httpsession.py\x94uh7h9)\x81\x94}\x94(h<h\x95h=\x8c\x04send\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x16\x01ubK\xe0hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0eaiohttp.client\x94h5\x8c>/opt/coiled/env/lib/python3.12/site-packages/aiohttp/client.py\x94uh7h9)\x81\x94}\x94(h<h\xa0h=\x8c\x08_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xa6\x03ubhSM\x0b\x03hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\x9fh5h\xa0uh7h9)\x81\x94}\x94(h<h\xa0h=\x8c\x19_connect_and_send_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xe1\x02ubhSM\xde\x02hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x07connect\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x88\x02ubhSM\x82\x02hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=\x8c\x12_create_connection\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x04ubhSM\xb9\x04hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h4h5h6uh7h9)\x81\x94}\x94(h<h6h=h>h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x02\x06ubhSM\x02\x06ububububub\x87\x94R\x94h\x8c\x88N)t\x94R\x94h\x1bbu\x87\x94Nh)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x15distributed.scheduler\x94h5\x8cE/opt/coiled/env/lib/python3.12/site-packages/distributed/scheduler.py\x94uh7h9)\x81\x94}\x94(h<h\xd1h=\x8c\x0cupdate_graph\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMt\x13ubM\n\x13hH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1edistributed.protocol.serialize\x94h5\x8cN/opt/coiled/env/lib/python3.12/site-packages/distributed/protocol/serialize.py\x94uh7h9)\x81\x94}\x94(h<h\xdch=\x8c\x0bdeserialize\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xc4\x01ubhSM\xc4\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xdbh5h\xdcuh7h9)\x81\x94}\x94(h<h\xdch=\x8c\x0cpickle_loads\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKoubhSKohThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1bdistributed.protocol.pickle\x94h5\x8cK/opt/coiled/env/lib/python3.12/site-packages/distributed/protocol/pickle.py\x94uh7h9)\x81\x94}\x94(h<h\xf0h=\x8c\x05loads\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKbubhSK]hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x1cxarray.backends.file
_manager\x94h5\x8cL/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/file_manager.py\x94uh7h9)\x81\x94}\x94(h<h\xfbh=\x8c\x0c__setstate__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x17\x01ubhSM\x17\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh=\x8c\x08__init__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x94ubhSK\x94hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh
=\x8c\t_make_key\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xa7ubhSK\xa7hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xfah5h\xfbuh7h9)\x81\x94}\x94(h<h\xfbh=j\x07\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMM\x01ubhSMM\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0bfsspec.spec\x94h5\x8c;/opt/coiled/env/lib/python3.12/site-packages/fsspec/spec.py\x94uh7h9)\x81\x94}\x94(h<j \x01\x00\x00h=\x8c\x08__hash__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x0
0hDK@hEK\x00ubhFM\x9f\x07ubhSM\x9f\x07hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x1f\x01\x00\x00h5j \x01\x00\x00uh7h9)\x81\x94}\x94(h<j \x01\x00\x00h=\x8c\x07details\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x85\x07ubhSM\x85\x07hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x0bfsspec.asyn\x94h5\x8c;/opt/coiled/env/lib/python3.12/site-packages/fsspec/asyn.py\x94uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x07wrapper\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@
hEK\x00ubhFKvubhSKvhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j3\x01\x00\x00h5j4\x01\x00\x00uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x04sync\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKgubhSKghThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j3\x01\x00\x00h5j4\x01\x00\x00uh7h9)\x81\x94}\x94(h<j4\x01\x00\x00h=\x8c\x07_runner\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK<ubhSK8hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\ts3fs.core\x94h5\x8c9/o
pt/coiled/env/lib/python3.12/site-packages/s3fs/core.py\x94uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=\x8c\x05_info\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x05ubhSM\xa5\x05hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=\x8c\x08_call_s3\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMs\x01ubhSMs\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94
(h<jQ\x01\x00\x00h=\x8c\x0e_error_wrapper\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x92ubhSK\x92hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3jP\x01\x00\x00h5jQ\x01\x00\x00uh7h9)\x81\x94}\x94(h<jQ\x01\x00\x00h=jf\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x92ubhSKrhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x13aiobotocore.context\x94h5\x8cC/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/context.py\x94uh7h9)\x81\x94}\x94(h<jv\x01
\x00\x00h=j7\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK$ubhSK$hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x12aiobotocore.client\x94h5\x8cB/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/client.py\x94uh7h9)\x81\x94}\x94(h<j\x80\x01\x00\x00h=\x8c\x0e_make_api_call\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x96\x01ubhSM\x96\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x7f\x01\x00\x00h5j\x80\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x
80\x01\x00\x00h=\x8c\r_make_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xb9\x01ubhSM\xb0\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x14aiobotocore.endpoint\x94h5\x8cD/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/endpoint.py\x94uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\r_send_request\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKxubhSKxhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x8
1\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x0c_needs_retry\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x18\x01ubhSM\x18\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x11aiobotocore.hooks\x94h5\x8cA/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/hooks.py\x94uh7h9)\x81\x94}\x94(h<j\xa8\x01\x00\x00h=\x8c\x05_emit\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKDubhSKDhThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x14aiobotocore._helpers\x94h5\x8cD/
opt/coiled/env/lib/python3.12/site-packages/aiobotocore/_helpers.py\x94uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=\x8c\x11resolve_awaitable\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x18aiobotocore.retryhandler\x94h5\x8cH/opt/coiled/env/lib/python3.12/site-packages/aiobotocore/retryhandler.py\x94uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=\x8c\x05_call\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFKkubhSKkhThH
)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xb2\x01\x00\x00h5j\xb3\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=j\xb6\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=j\xc1\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK~ubhSK~hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01
\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=\x8c\r_should_retry\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xa5ubhSK\xa5hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xb2\x01\x00\x00h5j\xb3\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xb3\x01\x00\x00h=j\xb6\x01\x00\x00h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\x06ubhSK\x06hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xbd\x01\x00\x00h5j\xbe\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xbe\x01\x00\x00h=j\xc1\x01\x00\x00h
?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xaeubhSK\xaehThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3\x8c\x15botocore.retryhandler\x94h5\x8cE/opt/coiled/env/lib/python3.12/site-packages/botocore/retryhandler.py\x94uh7h9)\x81\x94}\x94(h<j\xf2\x01\x00\x00h=\x8c\x08__call__\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xf7ubhSK\xf7hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\xf1\x01\x00\x00h5j\xf2\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\xf2\x01\x00\x00h=\x8c\x17_ch
eck_caught_exception\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\xa0\x01ubhSM\xa0\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x10_do_get_response\x94h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFK\xd0ubhSK\xc9hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3j\x93\x01\x00\x00h5j\x94\x01\x00\x00uh7h9)\x81\x94}\x94(h<j\x94\x01\x00\x00h=\x8c\x05_send\x94h?K\x00h@K\x00hA)hBK\x00hCK\x0
0hDK@hEK\x00ubhFM/\x01ubhSM/\x01hThH)\x81\x94}\x94(hKh,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\x94h5h\x95uh7h9)\x81\x94}\x94(h<h\x95h=h\x98h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFM\x16\x01ubhSM\x16\x01ubububububububububububububububububububububububububububububububububub\x87\x94R\x94h\xca\x89Nt\x94R\x94h)h,)\x81\x94}\x94(h/}\x94h1}\x94(h3h\xd0h5h\xd1uh7h9)\x81\x94}\x94(h<h\xd1h=h\xd4h?K\x00h@K\x00hA)hBK\x00hCK\x00hDK@hEK\x00ubhFMt\x13ubM\x13\x13N\x87\x94R\x94j\x1c\x02\x00\x00\x88N)t\x94R\
x94.'

another error

2025-11-27 06:18:18,481 - ERROR - GenericZarrHandler.py:1020 - publish_cloud_optimised_fileset_batch - 39b496bb-31ad-45e1-9ede-6fca2888f8c7: An unexpected error occurred during batch 2 processing: b'\x80\x05\x95\x92\x0b\x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x1dunpickle_exception_with_attrs\x94\x93\x94(\x8c\x1fdistributed.shuffle._exceptions\x94\x8c\x13P2PConsistencyError\x94\x93\x94}\x94(\x8c\x08__dict__\x94}\x94\x8c\x04args\x94\x8cBNo active shuffle with id=\'dbe46e5700b3cd9c0e51aa5b1ec8602d\' found\x94\x85\x94uh\x02(\x8c\x08builtins\x94\x8c\x08KeyError\x94\x93\x94}\x94(h\x07}\x94h\t\x8c dbe46e5700b3cd9c0e51aa5b1ec8602d\x94\x85\x94uNh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\x08f_locals\x94}\x94\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c%distributed.shuffle._scheduler_plugin\x94\x8c\x08__file__\x94\x8cU/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_scheduler_plugin.py\x94u\x8c\x06f_code\x94h\x15\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h!\x8c\x07co_name\x94\x8c\x03get\x94\x8c\x0bco_argcount\x94K\x00\x8c\x11co_kwonlyargcount\x94K\x00\x8c\x0bco_varnames\x94)\x8c\nco_nlocals\x94K\x00\x8c\x0cco_stacksize\x94K\x00\x8c\x08co_flags\x94K@\x8c\x0eco_firstlineno\x94K\x00ub\x8c\x08f_lineno\x94K\xafubK\xafh\x15\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(h\x1eh\x1fh h!uh"h$)\x81\x94}\x94(h\'h!h(\x8c\x04_get\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xbeub\x8c\ttb_lineno\x94K\xbeub\x87\x94R\x94N\x89N)t\x94R\x94h\x10bh\x14h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(\x8c\x08__name__\x94\x8c\x12distributed.worker\x94\x8c\x08__file__\x94\x8cB/opt/coiled/env/lib/python3.12/site-packages/distributed/worker.py\x94uh"h$)\x81\x94}\x94(h\'hJh(\x8c\x10_run_task_simple\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xb7\x0bubM\xaa\x0bh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x0fdask._task_spec\x94hI\x8c?/opt/coiled/env/lib/python3.12/site-packages/dask/_task_spec.py\x94uh"h$)\x81\x94}\x94(h\'hUh(\x8c\x08__call__\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xf7\x02ubh>M\xf7\x02\x8c\x07tb_next\x94h3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x19distributed.shuffle._core\x94hI\x8cI/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_core.py\x94uh"h$)\x81\x94}\x94(h\'hah(\x8c\x0bp2p_barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1MB\x02ubh>M>\x02hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c"distributed.shuffle._worker_plugin\x94hI\x8cR/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py\x94uh"h$)\x81\x94}\x94(h\'hlh(\x8c\x07barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\x87\x01ubh>M\x87\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x11distributed.utils\x94hI\x8cA/opt/coiled/env/lib/python3.12/site-packages/distributed/utils.py\x94uh"h$)\x81\x94}\x94(h\'hwh(\x8c\x04sync\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xc4\x01ubh>M\xc4\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhvhIhwuh"h$)\x81\x94}\x94(h\'hwh(\x8c\x01f\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xae\x01ubh>M\xaa\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x0btornado.gen\x94hI\x8c;/opt/coiled/env/lib/python3.12/site-packages/tornado/gen.py\x94uh"h$)\x81\x94}\x94(h\'h\x8bh(\x8c\x03run\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M6\x03ubh>M\x0f\x03hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x08_barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1Mj\x01ubh>Mj\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x0fget_most_recent\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xb1ubh>K\xb1hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x0fget_with_run_id\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1Kwubh>KwhYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x08_refresh\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xdeubh>K\xdehYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x06_fetch\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xc8ubh>K\xc8hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(\x8c\x08__name__\x94\x8c%distributed.shuffle._scheduler_plugin\x94\x8c\x08__file__\x94\x8cU/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_scheduler_plugin.py\x94uh"h$)\x81\x94}\x94(h\'h\xc5h(\x8c\x03get\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xb2ubh>K\xb2ubububububububububububub\x87\x94R\x94hB\x88N)t\x94R\x94h\x08b.'.                             Traceback (most recent call last):                                                                                                                                                                                                                    File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1007, in publish_cloud_optimised_fileset_batch                                                                                               self._write_ds(ds, idx)                                                                                                                                                                                                                            File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1786, in _write_ds                                                                                                                           self._append_zarr_store(ds)                                                                                                                                                                                                                        File "/home/ubuntu/github_repo/aodn_cloud_optimised/aodn_cloud_optimised/lib/GenericZarrHandler.py", line 1840, in _append_zarr_store                                                                                                                  ds.to_zarr(                                                                                                                                                                                                                                        File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/core/dataset.py", line 2292, in to_zarr                                                                                                                      return to_zarr(  # type: ignore[call-overload,misc]                                                                                                                                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/backends/api.py", line 2246, in to_zarr                                                                                                                      writes = writer.sync(                                                                                                                                                                                                                                         ^^^^^^^^^^^^                                                                                                                                                                                                                              File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/backends/common.py", line 357, in sync                                                                                                                       delayed_store = chunkmanager.store(                                                                                                                                                                                                                                  ^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 247, in store                                                                                                               return store(                                                                                                                                                                                                                                               ^^^^^^                                                                                                                                                                                                                                      File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/dask/array/core.py", line 1221, in store                                                                                                                            dask.compute(arrays, **kwargs)                                                                                                                                                                                                                     File "/home/ubuntu/miniforge3/envs/AodnCloudOptimised/lib/python3.12/site-packages/dask/base.py", line 681, in compute                                                                                                                                 results = schedule(expr, keys, **kwargs)                                                                                                                                                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                           File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_core.py", line 574, in p2p_barrier                                                                                                                                           File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 391, in barrier                                                                                                                                      File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 362, in _barrier                                                                                                                                     File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 177, in get_most_recent                                                                                                                              File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 119, in get_with_run_id                                                                                                                              File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 222, in _refresh                                                                                                                                     File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py", line 200, in _fetch                                                                                                                                       File "/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_scheduler_plugin.py", line 178, in get                                                                                                                                     Exception: b'\x80\x05\x95\x92\x0b\x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x1dunpickle_exception_with_attrs\x94\x93\x94(\x8c\x1fdistributed.shuffle._exceptions\x94\x8c\x13P2PConsistencyError\x94\x93\x94}\x94(\x8c\x08__dict__\x94}\x94\x8c\x04args\x94\x8cBNo active shuffle with id=\'dbe46e5700b3cd9c0e51aa5b1ec8602d\' found\x94\x85\x94uh\x02(\x8c\x08builtins\x94\x8c\x08KeyError\x94\x93\x94}\x94(h\x07}\x94h\t\x8c dbe46e5700b3cd9c0e51aa5b1ec8602d\x94\x85\x94uNh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\x08f_locals\x94}\x94\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c%distributed.shuffle._scheduler_plugin\x94\x8c\x08__file__\x94\x8cU/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_scheduler_plugin.py\x94u\x8c\x06f_code\x94h\x15\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h!\x8c\x07co_name\x94\x8c\x03get\x94\x8c\x0bco_argcount\x94K\x00\x8c\x11co_kwonlyargcount\x94K\x00\x8c\x0bco_varnames\x94)\x8c\nco_nlocals\x94K\x00\x8c\x0cco_stacksize\x94K\x00\x8c\x08co_flags\x94K@\x8c\x0eco_firstlineno\x94K\x00ub\x8c\x08f_lineno\x94K\xafubK\xafh\x15\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(h\x1eh\x1fh h!uh"h$)\x81\x94}\x94(h\'h!h(\x8c\x04_get\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xbeub\x8c\ttb_lineno\x94K\xbeub\x87\x94R\x94N\x89N)t\x94R\x94h\x10bh\x14h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(\x8c\x08__name__\x94\x8c\x12distributed.worker\x94\x8c\x08__file__\x94\x8cB/opt/coiled/env/lib/python3.12/site-packages/distributed/worker.py\x94uh"h$)\x81\x94}\x94(h\'hJh(\x8c\x10_run_task_simple\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xb7\x0bubM\xaa\x0bh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x0fdask._task_spec\x94hI\x8c?/opt/coiled/env/lib/python3.12/site-packages/dask/_task_spec.py\x94uh"h$)\x81\x9}\x94(h\'hUh(\x8c\x08__call__\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xf7\x02ubh>M\xf7\x02\x8c\x07tb_next\x94h3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x19distributed.shuffle._core\x94hI\x8cI/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_core.py\x94uh"h$)\x81\x94}\x94(h\'hah(\x8c\x0bp2p_barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1MB\x02ubh>M>\x02hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c"distributed.shuffle._worker_plugin\x94hI\x8cR/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_worker_plugin.py\x94uh"h$)\x81\x94}\x94(h\'hlh(\x8c\x07barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\x87\x01ubh>M\x87\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x11distributed.utils\x94hI\x8cA/opt/coiled/env/lib/python3.12/site-packages/distributed/utils.py\x94uh"h$)\x81\x94}\x94(h\'hwh(\x8c\x04sync\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xc4\x01ubh>M\xc4\x01hYh
3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhvhIhwuh"h$)\x81\x94}\x94(h\'hwh(\x8c\x01f\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M\xae\x01ubh>M\xaa\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hG\x8c\x
0btornado.gen\x94hI\x8c;/opt/coiled/env/lib/python3.12/site-packages/tornado/gen.py\x94uh"h$)\x81\x94}\x94(h\'h\x8bh(\x8c\x03run\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1M6\x03ubh>M\x0f\x03hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\
x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x08_barrier\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1Mj\x01ubh>Mj\x01hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x0fget_most_
recent\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xb1ubh>K\xb1hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x0fget_with_run_id\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1Kwubh
>KwhYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGhkhIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x08_refresh\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xdeubh>K\xdehYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(hGh
khIhluh"h$)\x81\x94}\x94(h\'hlh(\x8c\x06_fetch\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xc8ubh>K\xc8hYh3)\x81\x94}\x94(h6h\x17)\x81\x94}\x94(h\x1a}\x94h\x1c}\x94(\x8c\x08__name__\x94\x8c%distributed.shuffle._scheduler_plugin\x94\x8c\x0
8__file__\x94\x8cU/opt/coiled/env/lib/python3.12/site-packages/distributed/shuffle/_scheduler_plugin.py\x94uh"h$)\x81\x94}\x94(h\'h\xc5h(\x8c\x03get\x94h*K\x00h+K\x00h,)h-K\x00h.K\x00h/K@h0K\x00ubh1K\xb2ubh>K\xb2ubububububububububububub\x87\x94R
\x94hB\x88N)t\x94R\x94h\x08b.'

The only way I can digest this is to use an AI.

With coiled dashboard, I can see that most of the time, none of the workers/scheduler have any sort of error, good mem/cpu usage.

Anything else we need to know?:

Environment:

  • Dask version: distributed 2025.10.0
  • Python version:
  • Operating System:
  • Install method (conda, pip, source):

lbesnard avatar Nov 27 '25 06:11 lbesnard

It looks like there are two problems here:

  1. You're getting an unpickling error on the scheduler
  2. The exception is being mangled

Usually unpickling errors happen when you have a different software environment on your client and scheduler/workers. Most likely a different Python version. Given that you're using Coiled I suggest you reach out to their support to help with this.

I'll leave this open though because the exception mangling isn't great. When you get your environment issues resolved could you comment back here to let us know what it was as that might give us a clue to what is happening.

jacobtomlinson avatar Nov 27 '25 12:11 jacobtomlinson

@jacobtomlinson Thanks a lot for your help.

TLDR; env diff was the problem!


As suggested, I fixed my software env. First I did a poetry update on my package. But even after that, when creating my cluster i would get this message:

---------+--------+-----------+---------
| Package | Client | Scheduler | Workers |
+---------+--------+-----------+---------
| lz4     | 4.4.4  | 4.4.5     | 4.4.5   |
+---------+--------+-----------+---------+

Initially, I didn't really care too much about it, and TBH barely saw it. My code would start, output a lot of log on my terminal. (My script would run for 10 min, 30 min sometimes and then throw the logs as I mentioned above. I spent maybe a week on this, trying various dask config, from p2p/tasks to other obscure options.) As it was only a minor version of a package (lz4) I didn't even know about, i would just let my code proceed. And I assumed my poetry update should have fixed any package version mismatch anyway.

But my code failed again miserably.

I then decided to update lz4 of my client to 4.4.5. My code has been running for the last 4-5 hours without a single issue...

Now two things I don't really get. My client is ...a client! meaning that I don't quite understand why this has such an impact on the running code, but ok, I get it, some data needs to be serialised back from the scheduler to the client. But my biggest problem with this, is if the environment similarity between client/sched/worker is so important, why only raising a somewhat quiet warning. IMO, this should raise a

raise RuntimeError("Package version mismatch detected: client and worker versions do not match")

Is there an option to trigger this?

having this forced would be a massive quality of life improvement.

lbesnard avatar Dec 01 '25 03:12 lbesnard

Its a bit of a thorny problem that's been discussed in Dask for many years. Often things work fine with slight mismatches so we don't want to fail too aggressively, but some core things like python, dask, distributed and anything related to serialization/compression like lz4 can cause problems like the one you experienced. I did start some work on this in #5582 but it never got over the line. This was also discussed again in #7017.

jacobtomlinson avatar Dec 01 '25 10:12 jacobtomlinson

@jacobtomlinson , thanks for the background. I had a look at https://github.com/dask/distributed/pull/5582

I imagine Dask developers rarely run into issues like this because they already know the quirks (such as the one discussed here) and understand why failures occur. For a user/consumer like me, using Dask largely as a black box, debugging is difficult enough that without deeper knowledge it’s very easy to head down the wrong rabbit holes.

I'm in favour of your PR, and maybe even a new dask config option in distributed.yaml to cancel the creation of the cluster if critical is hit.

lbesnard avatar Dec 02 '25 00:12 lbesnard