Gabe Joseph
Gabe Joseph
_This is a duplicate of https://github.com/dask/dask/pull/8209, but I'm taking over from @fjetter since I may want to occasionally push small fixes here. Original message copied below, with edits._ This is...
**Minimal Complete Verifiable Example**: ```python import dask.dataframe as dd df = dd.demo.make_timeseries() # Works with divisions df.x.rolling("50D").mean().compute() # timestamp # 2000-01-31 00:00:00 0.637020 # 2000-01-31 00:00:10 -0.179649 # 2000-01-31 00:00:20...
Because `fuse_roots` materializes Blockwise layers, if you have a situation where you're sub-selecting a few items out of a large initial Blockwise array, culling the graph (cheap) before materializing it...
Perhaps a niche use-case, but sometimes I find myself SSHing into a box (say, a k8s node) that's running Python workloads in docker, but doesn't have pip installed on the...
The converse of https://github.com/benfred/py-spy/issues/229: I prefer speedscope format, so it would be nice to just be able to type `py-spy record -o profile.json` and not also have to add `-f...
[This file](https://gist.githubusercontent.com/gjoseph92/8485ce1b7eeaf7a6496c547133b27224/raw/c689e9c77ff51b8b74753f5b8faf3b0bd513e418/profile-no-negative-durs.json) in Chrome trace format looks very different in speedscope versus `chrome://tracing`:   In...
With multi-threaded profiles, setting `#title=` turns every thread's name into the title, which makes them difficult to tell them apart. When there are multiple threads, I'd prefer if it kept...
When a worker breaks its connection to the scheduler, or goes too long without heartbeating, the scheduler removes it and reschedules its tasks to run elsewhere. However, other works may...
In https://github.com/dask/distributed/issues/6110#issuecomment-1105837219, we found that workers were running themselves out of memory to the point where the machines became unresponsive. Because the memory limit in the Nanny is implemented [at...
It would be handy to be able to record multiple measures at once in `MemorySampler`. Particularly, recording `process` and `managed_spilled` at the same time gives you a picture of both...