Nick Becker
Nick Becker
This PR implements `{Series, DataFrame}.groupby.rank` based on the same hash partition pattern used by `groupby.{shift, transform, apply}`. - [x] Closes #8658 - [x] Tests added / passed (locally) - [x]...
For pandas API compatibility, we can implement [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.truncate.html) and [DataFrame.truncate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.truncate.html). Truncate is "a useful shorthand for boolean indexing based on index values above or below certain thresholds." The DataFrame method...
As we now support categorical LightGBM models , we should update the FIL docs - https://github.com/rapidsai/cuml/blob/d78a8056511788f0d04554e376a4c075eb7135c6/python/cuml/fil/fil.pyx#L489 - https://github.com/rapidsai/cuml/tree/branch-22.08/python/cuml/fil#features Context: https://github.com/rapidsai/cuml/issues/1424#issuecomment-1170230052
As noted in https://github.com/rapidsai/cudf/issues/10024 , cuML RandomForestClassifier will throw an error if the target column has non-consecutive labels outside of the [0, n) range. This does not occur in scikit-learn,...
I'd like to be able to do a GROUP BY aggregation and then filter the resulting aggregated data based on a condition using the HAVING clause while including the aggregation...
I'd like to be able to call LCASE on a string column to convert it to lowercase, like in MySQL. This is an alias for LOWER, which is noted in...
GPU backend dependencies aren't included in the development environment yml files, causing pytests to fail with `--rungpu` out of the box. Datafusion branch: https://github.com/dask-contrib/dask-sql/blob/datafusion-sql-planner/continuous_integration/environment-3.10-dev.yaml It would be useful for there...
I'd like to be able to use HDBSCAN to calculate membership vectors for points, like I can with the CPU library. Per the CPU library [documentation](https://hdbscan.readthedocs.io/en/latest/api.html#hdbscan.prediction.membership_vector), this function "produces a...
This PR updates the existing KMeans notebook to improve clarity and make the performance benefit clear. Closes https://github.com/rapidsai/cuml/issues/4135
For API compatibility, I'd like to be able to generate prediction data after the fact using `generate_prediction_data`, even if I set `prediction_data=False` when I instantiated my clusterer. I can do...