Nick Becker

Results 180 comments of Nick Becker

Dask cuDF will spill from GPU to CPU memory during dataframe operations by default. The threshold for spilling can be configured by using the `device_memory_limit` parameter as you're doing. Dask...

Happy to answer any questions about cuML, if folks are interested. We've done a lot of work to tighten our interface compliance with scikit-learn since summer 2020, which has made...

With the merge of #4800, soft clustering the original dataset with `all_points_membership_vectors` is now available. Please give it a try and file issues if you run into any issues or...

Generalizing from @jhancock1975 's code snippet, this error can occur even with correctly formatted data during cross validation (if a fold doesn't get the right subset of labels). ```python import...

Scikit-learn does this under the hood here: https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L371 https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L756-L775

It should be possible to do this using `cuml.prims.label.make_monotonic`, which was designed for this use case. This function looks like it might have an expensive JIT cost, though. Perhaps https://github.com/rapidsai/cuml/blob/768a4ed943fd5d33b0fe280b5db579ed918c7cfd/cpp/src_prims/label/classlabels.cuh#L164...

That sounds like it could be a good solution. I suspect this non-consecutive label issue will keep popping up. Will file a new issue on RAFT, cross-link, and mark it...

Was this closed by https://github.com/rapidsai/cuml/pull/4317 ?

No, this still presents with the example above in 22.06. ```python train_auc = roc_auc_score(y_true=y_cudf.to_pandas().sort_index(), y_score=y_pd_pred) # print(y_dask_pred.index.compute()) print(train_auc) 0.979820640925938 train_auc = roc_auc_score(y_true=y_cudf.to_pandas(), y_score=y_pd_pred) # print(y_dask_pred.index.compute()) print(train_auc) 0.500788696142471 ```

Care of @bdice, possibly related to how we're packaging libcu++?