Nick Becker issues

Results 55 issues of


                                            Nick Becker

Add Groupby.rank for DataFrame and Series GroupBy

This PR implements `{Series, DataFrame}.groupby.rank` based on the same hash partition pattern used by `groupby.{shift, transform, apply}`. - [x] Closes #8658 - [x] Tests added / passed (locally) - [x]...

dataframe

feature

[FEA] DataFrame and Series truncate

For pandas API compatibility, we can implement [Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.truncate.html) and [DataFrame.truncate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.truncate.html). Truncate is "a useful shorthand for boolean indexing based on index values above or below certain thresholds." The DataFrame method...

feature request

good first issue

cuDF (Python)

[DOC] Update FIL docs to include support for categorical LightGBM

As we now support categorical LightGBM models , we should update the FIL docs - https://github.com/rapidsai/cuml/blob/d78a8056511788f0d04554e376a4c075eb7135c6/python/cuml/fil/fil.pyx#L489 - https://github.com/rapidsai/cuml/tree/branch-22.08/python/cuml/fil#features Context: https://github.com/rapidsai/cuml/issues/1424#issuecomment-1170230052

? - Needs Triage

doc

inactive-30d

[FEA] cuML estimators should support non-consecutive labels outside of [0, n) where appropriate

As noted in https://github.com/rapidsai/cudf/issues/10024 , cuML RandomForestClassifier will throw an error if the target column has non-consecutive labels outside of the [0, n) range. This does not occur in scikit-learn,...

feature request

? - Needs Triage

inactive-30d

[FEA] Support aliases in aggregation clauses

I'd like to be able to do a GROUP BY aggregation and then filter the resulting aggregated data based on a condition using the HAVING clause while including the aggregation...

[FEA] Support LCASE alias for LOWER

I'd like to be able to call LCASE on a string column to convert it to lowercase, like in MySQL. This is an alias for LOWER, which is noted in...

good first issue

[ENH] Create GPU development environment conda yml files

GPU backend dependencies aren't included in the development environment yml files, causing pytests to fail with `--rungpu` out of the box. Datafusion branch: https://github.com/dask-contrib/dask-sql/blob/datafusion-sql-planner/continuous_integration/environment-3.10-dev.yaml It would be useful for there...

enhancement

needs triage

datafusion

[FEA] Support for HDBSCAN membership_vector and all_points_membership_vectors

I'd like to be able to use HDBSCAN to calculate membership vectors for points, like I can with the CPU library. Per the CPU library [documentation](https://hdbscan.readthedocs.io/en/latest/api.html#hdbscan.prediction.membership_vector), this function "produces a...

feature request

CUDA / C++

Cython / Python

Update KMeans notebook for clarity

This PR updates the existing KMeans notebook to improve clarity and make the performance benefit clear. Closes https://github.com/rapidsai/cuml/issues/4135

doc

Cython / Python

non-breaking

[FEA] HDBSCAN support for generating prediction data even if prediction_data=False

For API compatibility, I'd like to be able to generate prediction data after the fact using `generate_prediction_data`, even if I set `prediction_data=False` when I instantiated my clusterer. I can do...

feature request

? - Needs Triage