uwot Rolling out uwot's C++ code as a header-only library

Rolling out uwot's C++ code as a header-only library

Open LTLA opened this issue 2 years ago • 10 comments

I wonder whether this would be of interest; to squeeze out the C++ code in here to a separate header-only library, in much the same way that https://github.com/LTLA/CppIrlba contains the relevant contents of irlba. Mostly so that I can use it for other applications without the challenge of dragging in R (or Python) runtimes. And then you could chuck the library into inst/include and we would be able to share single implementation with relative ease.

I was planning to give it a go on the weekend. Will need to strip out all the Rcpp stuff, I don't know how pervasive that is. Will also need to add a "no-parallel" option that avoids any calls to <thread> as my target system's support for that is kinda wonky.

Aug 05 '21 23:08 LTLA

To the extent it's possible, that's a good idea (I subsequently did a better job of keeping R-specifics separate in rnndescent). This is something I thought about doing at some point in the future, but the main reason I never took it more seriously is because the pure C++ parts aren't very useful on their own (also I don't know any CMake). The nearest neighbor calculations and initialization all have to be provided separately, so you're really just getting the optimization bit. If that's useful to you, I'm happy to provide what assistance I can.

Aug 06 '21 04:08 jlmelville

Yep, I was going to supply the NN's myself (https://github.com/LTLA/knncolle). On a tangentially related note, it would be nice to make a pure C++ port of nndescent available from that interface. Would be happy to help out there if you're interested.

The initialization... is within the realm of feasibility. Could link to Spectra, or could modify CppIrlba to handle smallest = TRUE. Not quite sure which one is less work - will have to try it out. Was there a reason for the use of Spectra as the default?

Anyway, testing out the initialization is probably a solid weekend project on my side. If you have the bandwidth, maybe you could reorganize the stuff across src/ and inst/include to create a pure C++ interface to your optimization code. Then we might eventually be able to plug and play with all the three components (NN, init, optim).

Aug 06 '21 05:08 LTLA

Yep, I was going to supply the NN's myself (https://github.com/LTLA/knncolle). On a tangentially related note, it would be nice to make a pure C++ port of nndescent available from that interface. Would be happy to help out there if you're interested.

That can happen too... eventually.

The initialization... is within the realm of feasibility. Could link to Spectra, or could modify CppIrlba to handle smallest = TRUE. Not quite sure which one is less work - will have to try it out. Was there a reason for the use of Spectra as the default?

At the time, the irlba partial_eigen was described as "somewhat experimental" and in practice was a lot slower than using RSpectra. Maybe that's changed now.

Anyway, testing out the initialization is probably a solid weekend project on my side. If you have the bandwidth, maybe you could reorganize the stuff across src/ and inst/include to create a pure C++ interface to your optimization code. Then we might eventually be able to plug and play with all the three components (NN, init, optim).

Not sure about timelines but I will start taking a look and see if this seems achievable in some reasonable amount of time or if it's going to reveal some larger structural changes will be required.

Aug 06 '21 15:08 jlmelville

I failed to make any progress this weekend, but it is closer to the top of my to-do pile.

Aug 09 '21 05:08 jlmelville

No worries. I also failed to make any progress as well, got distracted by https://github.com/LTLA/qdtsne.

Aug 09 '21 05:08 LTLA

Made a start on the initialization: https://github.com/LTLA/umappp.

The most that can be said right now is that it compiles and runs.

Aug 16 '21 07:08 LTLA

Sorry I have made zero contributions to this so far. I was traveling for the last two weeks and had little to no internet access.

Sep 11 '21 17:09 jlmelville

No problems whatsoever - it is, in fact, already done! The code in uwot's inst/include was easier to read than I thought, so it was fairly straightforward to get what I needed. Check it out:

demo

Close enough, I'd say. I know we're identical up to the optimization, so I'm guessing that the differences are due to our different PRNGs - I'm using std::mt19937_64 to avoid the need to manage another dependency.

I didn't add any of the other bells and whistles, e.g., no support for supervised training, no support for tumap or largevis. I don't need them personally, but I could work on that if we wanted to turn uwot into an R wrapper around a fully-featured C++ library. Interested to hear your thoughts here - I don't mind either way.

In the meantime, I'll post a few more issues on things I discovered along the way.

Sep 12 '21 01:09 LTLA

Close enough, I'd say. I know we're identical up to the optimization, so I'm guessing that the differences are due to our different PRNGs - I'm using std::mt19937_64 to avoid the need to manage another dependency.

Does changing the PRNG away from the Tausworthe88 have an effect on the speed?

I didn't add any of the other bells and whistles, e.g., no support for supervised training, no support for tumap or largevis. I don't need them personally, but I could work on that if we wanted to turn uwot into an R wrapper around a fully-featured C++ library. Interested to hear your thoughts here - I don't mind either way.

It would be a shame to not use UMAPPP if possible. I wouldn't want to weigh it down with features that not many people care about (I have zero idea if anyone makes use of tumap or largevis), but I also don't want to keep track of two separate but similar C++ code bases (although they haven't actually changed very much).

Sep 12 '21 07:09 jlmelville

tumap is great 😅

Nov 16 '21 15:11 PedroMilanezAlmeida

uwot uwot copied to clipboard

Rolling out uwot's C++ code as a header-only library

uwot
uwot copied to clipboard