dask-histogram KeyError when 'dask_histogram.boost.Histogram().Fill()' with dask dataframe

Dear experts, I am starting to use dask and dask_histogram, but I am facing an error when I want to fill a dask_histogram.boost with a dataframe as below:

import numpy as np
import dask.dataframe as dd
import dask_histogram.boost as dhb

# this is reproducible
d = {
    'A': np.random.normal(0., 1., 100000),
    'W': np.random.uniform(0.2, 0.8, 100000),
}
ddf = dd.from_dict(d, npartitions=10)

h = dhb.Histogram(
    dhb.axis.Regular(10, -3, 3),
    storage=dhb.storage.Weight()
).fill(ddf['A'], weight=ddf['W']).compute()
print(h)

This example gives me :

Traceback (most recent call last):
  File "/gpfs/home/belle2/rlebouch/darkphotontodimuons/background_rejection/testdask.py", line 15, in <module>
    ).fill(ddf['A'], weight=ddf['W']).compute()
                                      ^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 653, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/base.py", line 422, in collections_to_dsk
    dsk = opt(dsk, keys, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask_histogram/core.py", line 514, in optimize
    dsk = fuse_roots(dsk, keys=keys)  # type: ignore
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 1564, in fuse_roots
    new = toolz.merge(layer, *[layers[dep] for dep in deps])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/toolz/dicttoolz.py", line 39, in merge
    rv.update(d)
  File "<frozen _collections_abc>", line 836, in __iter__
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 641, in __iter__
    return iter(self._dict)
                ^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 607, in _dict
    dsk = _make_blockwise_graph(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 958, in _make_blockwise_graph
    itertools.product(*[range(dims[i]) for i in out_indices])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/belle2/rlebouch/.local/lib/python3.11/site-packages/dask/blockwise.py", line 958, in <listcomp>
    itertools.product(*[range(dims[i]) for i in out_indices])
                              ~~~~^^^
KeyError: '.0'

Is It really possible to fill a histogram from a data frame?

I currently use: Name: dask-histogram Version: 2024.12.1

Name: dask Version: 2024.12.1

Name: boost_histogram Version: 1.4.1

Jan 08 '25 00:01 RobinTimTom

This problem stems from the new dask.dataframe backend that is based on dask-expr; dask-histogram isn't compatible at this time. More info here: https://github.com/dask-contrib/dask-histogram/pull/130

The code will work with the Dask config environment variable DASK_DATAFRAME__QUERY_PLANNING=False or with dask.config.set("dataframe.query-planning", False) in Python code.

Jan 09 '25 21:01 douglasdavis

I added your suggestion to my code, but it solved nothing, and I still have the same error message.

Jan 09 '25 23:01 RobinTimTom

Can you share more details? Did you export the environment variable or use the dask.config API?

Jan 10 '25 00:01 douglasdavis

I tried with the dask.config AP

Jan 10 '25 01:01 RobinTimTom

Hmm yeah I can only make it work with the env variable but not with the config; maybe it's an artifact of mixing dask-histogram & dask.dataframe, I'm not sure. That's probably another independent issue. But anyway, this is the workaround for now:

~/software/repos/dask-histogram main ❯ DASK_DATAFRAME__QUERY_PLANNING=False ipython                                                                       22s 󰌠 3.12.8 (dask-histogram) 󰊭 gitddavisdev 19:53:58
Python 3.12.8 (main, Dec  3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.30.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np
   ...: import dask.dataframe as dd
   ...: import dask_histogram.boost as dhb
   ...:
   ...: # this is reproducible
   ...: d = {
   ...:     'A': np.random.normal(0., 1., 100000),
   ...:     'W': np.random.uniform(0.2, 0.8, 100000),
   ...: }
   ...: ddf = dd.from_dict(d, npartitions=10)
   ...:
   ...: h = dhb.Histogram(
   ...:     dhb.axis.Regular(10, -3, 3),
   ...:     storage=dhb.storage.Weight()
   ...: ).fill(ddf['A'], weight=ddf['W']).compute()
   ...: print(h)
/Users/ddavis/software/repos/dask-histogram/.venv/lib/python3.12/site-packages/dask/dataframe/__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning.
  warnings.warn(
                       ┌─────────────────────────────────────────────────────┐
[-inf,   -3) 66.92     │▎                                                    │
[  -3, -2.4) 357.9     │█▋                                                   │
[-2.4, -1.8) 1391      │██████▍                                              │
[-1.8, -1.2) 3997      │██████████████████▎                                  │
[-1.2, -0.6) 7929      │████████████████████████████████████▎                │
[-0.6,    0) 1.139e+04 │████████████████████████████████████████████████████ │
[   0,  0.6) 1.111e+04 │██████████████████████████████████████████████████▊  │
[ 0.6,  1.2) 8052      │████████████████████████████████████▊                │
[ 1.2,  1.8) 3914      │█████████████████▉                                   │
[ 1.8,  2.4) 1368      │██████▎                                              │
[ 2.4,    3) 324.1     │█▌                                                   │
[   3,  inf) 63.99     │▎                                                    │
                       └─────────────────────────────────────────────────────┘

Jan 10 '25 01:01 douglasdavis

dask-histogram dask-histogram copied to clipboard

KeyError when 'dask_histogram.boost.Histogram().Fill()' with dask dataframe

dask-histogram
dask-histogram copied to clipboard