Nick Becker comments

Results 180 comments of


                                            Nick Becker

Running CCA on GPU

Hi! I just came across this issue due to the cuML / RAPIDS mention. I wanted to note that we've implemented input-to-output data type consistency for all cuML estimators (not...

📚 Inaccurate pre-trained model predictions master thread

Noticed an example in which the small model fails but the medium model succeeds. `murmured` is incorrectly tagged as a proper noun starting the sentence in the example below (perhaps...

Add Groupby.rank for DataFrame and Series GroupBy

rerun tests

Add Groupby.rank for DataFrame and Series GroupBy

Done, thanks for the bump.

Add Groupby.rank for DataFrame and Series GroupBy

I see some test failures in test_parquet.py. Are any of the parquet tests known to be flaky, or should I look into any unexpected interaction with this PR? I also...

Add Groupby.rank for DataFrame and Series GroupBy

Test failures appear tied to the version of pandas. The Python 3.9 environment uses pandas 1.4, in which groupby.rank behaves differently than in prior versions. The 3.8 environment uses 1.2.5...

Add Groupby.rank for DataFrame and Series GroupBy

I've been thinking about this a bit more, and I've arrived a question that I think applies to this PR and also the already implemented `groupby.{shift, transform, apply}` (though is...

[BUG] Multiple DataFrame.loc operations gives confusing error message upon compute on Dask-cuDF

While cuDF could raise a more informative error rather than leaking internals, this is a Dask issue due to not being able to align the indexes. We can leave this...

Implementation from cuML in Berttopic

cc @vibhujawa

Implementation from cuML in Berttopic

As a note, [membership_vector and all_points_membership_vectors](https://github.com/rapidsai/cuml/issues/4724) are on our radar for cuML's HDBSCAN. Perhaps this might be an opportunity to define something like `is_hdbscan_like` in the spirit of scikit-learn's `is_classifier`...