Tom Augspurger
Tom Augspurger
Perhaps we leave this open to discuss if the default resources for the scheduler should be higher? I don't know what's best here. It'd be nice to have better logs...
Thanks. I think we should start with getting this running on CI and gradually add types one module or so at a time. That'll make reviewing things much easier.
It's possible to only use mypy on specific files. Pandas is going through that now. You can see the configuration starting at https://github.com/pandas-dev/pandas/blob/dbd7a5d3e2f1d196e8634c620fc72db1127de157/setup.cfg#L124. I think the current effort is around...
This fell on the back-burner since it's mostly just a development workflow thing. It shouldn't have any user-facing changes. The high-level split will still be * dask-glm for optimizers like...
FYI @mmccarty fixed the merge conflicts. Will see if CI passes.
> Another question is if one should ever rely on the order of categories in Pandas categorical types... Only if the categorical is ordered. What does the proposed fixed behavior...
Thanks for the report @zexuan-zhou. Are you able to debug it further? Most likely scikit-learn previously cast a (dask) DataFrame to an ndarray, but no longer does that. We were...
Thanks for the reproducible example. We'll need someone to step through and figure out exactly what changed in scikit-learn / pandas and adapt. I won't have time to work on...
FYI, I started on this at https://gist.github.com/TomAugspurger/2889a052b5fec4d691f83ba2062d2d92 As you predicted `X.map_blocks(model.predict)` was slow. I stopped as soon as I hit an error, and didn't do any profiling yet. I'll pick...
Oh, and `/profile-server` is going to be extremely useful here. On a whim, I tried `X.map_blocks(delayed(model.predict))` and the scheduler has been at 100% CPU for a minute while the workers...