Benjamin Zaitlen

Results 198 comments of Benjamin Zaitlen

With 24.04 out I think this can be closed. @dantegd / @krunolp please re-open if you feel otherwise

I'm seeing something odd with dtype handling in dask-cudf and *not* cudf. `cudf.set_index(...)` does the right thing with dtype handling and the newly created index has type `

I think i've got it in cudf from exactly where you said. In the `map_partitions` method we call `self._meta.set_index(colname)` which eventually calls to https://github.com/rapidsai/cudf/blob/branch-0.6/python/cudf/dataframe/index.py#L541 . `as_index` comes back with a...

Dask-CUDA will probably not handle this kind of automated cluster creation. Instead, we (@jacobtomlinson ) has explored a bit around inferring hardware and auto annotating that cluster in https://github.com/jacobtomlinson/dask-agent I...

@cjnolet is it too late to review this for 24.04 ? We'll retarget for 24.06 but we'd still like your review before merging in

That's a good idea @jakirkham . I updated the test operations

> Would this benchmark be better suited as a reproducible script, a one-off notebook with some plots to reflect the results, or both? Seems like it might be leaning towards...

This is really cool to see! @TomAugspurger if you want me to handle the viz part once the measuring is in I'd be happy to take that on

@TomAugspurger Do you have any thoughts about how you think think this should be visualized ? For reference we are currently tracking/visualizing time per key here: https://github.com/dask/distributed/pull/3933#issuecomment-651272161

I think we can do that with [Whiskers](https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#whiskers) . Do you want me to that after this PR ?