Modify reduce_dimensionality to use fit_transform
Modifying _reduce_dimensionality to use fit_transform in single step, supporting the use of nearest neighbor descent when using cuML UMAP.
Fixes #2335
Before submitting
- [ ] This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
- [X] Did you read the contributor guideline?
- [X] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
- Adding support for cuML's nn-descent with UMAP
- [ ] Did you make sure to update the documentation with your changes (if applicable)?
- [ ] Did you write any new necessary tests?
Thanks for the PR! I'm seeing some errors popping up here and there. Could you take a look?
@jemather do you want to look at the failures or would it be Ok for me to "take over" this PR to address the test failures?
@betatim Sorry, started working on the changes and did not commit them. Let me see if I can get this done here in the next hour or so and if not happy to hand it over. Thank you for the nudge!
@betatim took me a few tries, but think it should be set now.
Cool beans! Happy that all it took was a nudge :D :D
I don't have direct experience with BERTopic, coming to this as someone who works on cuml. Hence motivated to help move this PR along. @MaartenGr the diff looks reasonable, but I don't know much about the code base. Is there something I can help with regarding this PR or does it "just" need time from one of the maintainers to take a look at it and think about (I say "just" because maintainer time and brain power is one of the rarest resources on planet earth ;) )
Looks like this might be ready to merge? Is there any way we could help with this PR?
It's almost there. My comments that I made on June 6th (see above) weren't addressed just yet. If those are addressed and cover the mentioned use cases, then I can go ahead and merge.
@jemather I picked up this PR to implement the outstanding review comments in #2416. I hope this is fine with you. I kept all the original commits from this PR, so that you get credit for your work.
Closing this as it was picked up and merged in #2416