Nick Becker
Nick Becker
Dask cuDF will spill from GPU to CPU memory during dataframe operations by default. The threshold for spilling can be configured by using the `device_memory_limit` parameter as you're doing. Dask...
Happy to answer any questions about cuML, if folks are interested. We've done a lot of work to tighten our interface compliance with scikit-learn since summer 2020, which has made...
With the merge of #4800, soft clustering the original dataset with `all_points_membership_vectors` is now available. Please give it a try and file issues if you run into any issues or...
Generalizing from @jhancock1975 's code snippet, this error can occur even with correctly formatted data during cross validation (if a fold doesn't get the right subset of labels). ```python import...
Scikit-learn does this under the hood here: https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L371 https://github.com/scikit-learn/scikit-learn/blob/e5736afb316038c43301d2c53ce39f9a89b64495/sklearn/ensemble/_forest.py#L756-L775
It should be possible to do this using `cuml.prims.label.make_monotonic`, which was designed for this use case. This function looks like it might have an expensive JIT cost, though. Perhaps https://github.com/rapidsai/cuml/blob/768a4ed943fd5d33b0fe280b5db579ed918c7cfd/cpp/src_prims/label/classlabels.cuh#L164...
That sounds like it could be a good solution. I suspect this non-consecutive label issue will keep popping up. Will file a new issue on RAFT, cross-link, and mark it...
Was this closed by https://github.com/rapidsai/cuml/pull/4317 ?
No, this still presents with the example above in 22.06. ```python train_auc = roc_auc_score(y_true=y_cudf.to_pandas().sort_index(), y_score=y_pd_pred) # print(y_dask_pred.index.compute()) print(train_auc) 0.979820640925938 train_auc = roc_auc_score(y_true=y_cudf.to_pandas(), y_score=y_pd_pred) # print(y_dask_pred.index.compute()) print(train_auc) 0.500788696142471 ```
Care of @bdice, possibly related to how we're packaging libcu++?