Nick Becker

Results 55 issues of Nick Becker

This PR implements `{Series, DataFrame}.groupby.rank` based on the same hash partition pattern used by `groupby.{shift, transform, apply}`. - [x] Closes #8658 - [x] Tests added / passed (locally) - [x]...

dataframe
feature

For pandas API compatibility, we can implement [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.truncate.html) and [DataFrame.truncate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.truncate.html). Truncate is "a useful shorthand for boolean indexing based on index values above or below certain thresholds." The DataFrame method...

feature request
good first issue
cuDF (Python)

As we now support categorical LightGBM models , we should update the FIL docs - https://github.com/rapidsai/cuml/blob/d78a8056511788f0d04554e376a4c075eb7135c6/python/cuml/fil/fil.pyx#L489 - https://github.com/rapidsai/cuml/tree/branch-22.08/python/cuml/fil#features Context: https://github.com/rapidsai/cuml/issues/1424#issuecomment-1170230052

? - Needs Triage
doc
inactive-30d

As noted in https://github.com/rapidsai/cudf/issues/10024 , cuML RandomForestClassifier will throw an error if the target column has non-consecutive labels outside of the [0, n) range. This does not occur in scikit-learn,...

feature request
? - Needs Triage
inactive-30d

I'd like to be able to do a GROUP BY aggregation and then filter the resulting aggregated data based on a condition using the HAVING clause while including the aggregation...

I'd like to be able to call LCASE on a string column to convert it to lowercase, like in MySQL. This is an alias for LOWER, which is noted in...

good first issue

GPU backend dependencies aren't included in the development environment yml files, causing pytests to fail with `--rungpu` out of the box. Datafusion branch: https://github.com/dask-contrib/dask-sql/blob/datafusion-sql-planner/continuous_integration/environment-3.10-dev.yaml It would be useful for there...

enhancement
needs triage
datafusion

I'd like to be able to use HDBSCAN to calculate membership vectors for points, like I can with the CPU library. Per the CPU library [documentation](https://hdbscan.readthedocs.io/en/latest/api.html#hdbscan.prediction.membership_vector), this function "produces a...

feature request
CUDA / C++
Cython / Python

This PR updates the existing KMeans notebook to improve clarity and make the performance benefit clear. Closes https://github.com/rapidsai/cuml/issues/4135

doc
Cython / Python
non-breaking

For API compatibility, I'd like to be able to generate prediction data after the fact using `generate_prediction_data`, even if I set `prediction_data=False` when I instantiated my clusterer. I can do...

feature request
? - Needs Triage