Do you want PR with cuda support?
Love your crate, excellent work! I have a need to fork and add GPU support via either rust-cuda or wgpu crates. Uncertain of the current scope right now, but at the very least I'll be adding support into the LBFGS solver/optimizer. I have loads of ML tasks to complete (many still open-ended), so this can/will easily expand into other modules/structs throughout the smartcore crate.
I need this for myself, so I'll be doing it regardless. Just wondering - is it worth my while to put in the extra time to polish it into a PR? In other words, if I submitted a PR with GPU support, would you review and potentially merge it? Or am I better off just concentrating on my own needs and not worrying about the extra time to polish it into a PR?
If you want me to go the PR route, anything you'd like me to keep in mind while implementing this, just let me know. Obviously as a separate feature. Want rust-cuda, wgpu, opencl, or do you have a preference?
Cheers, Matt
Thanks Matt for considering contributing. That would be great to have some GPU support if this means improving some algorithms or potentially multiple of the ones that can gain from using GPUs.
To understand the roadmap for smartcore, you can take a peek to these links: discussion before version 0.4 was released and roadmap.
I summarise here the core concepts (see also CONTRIBUTING.md) for the library design for smartcore:
- accessibilty:
- keep the API as pythonic-like as possible, possibly mimicking/overlapping with sciki-learn
- keep library easy to develop for Rust beginners (e.g. no macros, only pure Rust)
- keep the library as safe as Rust can be (e.g. no bindings from other languages, no unsafe blocks).
- use cases:
- target embedded devices and in-browser usage (i.e. keep library size as small as possible)
- target software engineers that want to learn ML and/or embed ML in their Rust services but also ML practitioners that want to learn Rust. Please always consider these base concepts when contributing.
About your questions: exactly as you said, if this is kept as a feature it would be nice but this requires that all the libraries used by the feature need to be written in Rust; we may make an exception if there are some temporary bindings that will be ported to native Rust in the future (otherwise see the option below about creating a separate package).
is it worth my while to put in the extra time to polish it into a PR? In other words, if I submitted a PR with GPU support, would you review and potentially merge it? Or am I better off just concentrating on my own needs and not worrying about the extra time to polish it into a PR?
If you can do it while preserving the base principles, it would be nice for the library to support more substrates.
I'll be adding support into the LBFGS solver/optimizer
do you have an estimation about what kind of performance gains we are talking about in this case?
If you think you can commit at that point in the medium term, maybe with your addition we can target also GPU-based edge-computing or cloud deployments running on GPUs? If this is the case that would be nice but maybe for this we can open a new repository like smartcore-gpu forking the base smartcore and providing a new package on its own in cargo; so to have people pick the right package for their own needs and keep the CPU/GPU distinction as we are talking different computational models-architectures and this makes a lot of difference to people working with energy-efficient chips. In this latest case of the new package. This would be a great opportunity to parallelise some compute and you would be also freer to take your own decisions in terms of which libraries to support, trying to keep the alignment to the smartcore principles as much as possible (I can create a repo in the smartcorelib organisation and set you as owner).
Personally I lean more on something like SIMD or PTX (probably there are some nice Rust libraries to wrap these computation models but I am not an expert and probably will take too much effort). Hope I provided all the references to allow your decision. Let me know if I can help you with anything.
Welcome if you decide to get on board! Lorenzo
check also #207 and #4
Sounds great! Looked at those other two issues, Triton requires C bindings so if you don't mind not going to touch that as if prior experience with C bindings is anything to go off it's going to make me want to throw my laptop out the window. Would prefer to stick with pure Rust and not bother with C bindings.
emu has 9700 downloads while wgpu crate has 8.8 million downloads, and both achieve the same thing of supporting multiple different hardware vendors. If you don't mind will stick with the tried and tested wgpu crate.
Already have smartcore forked, and will get to work on this starting today. Definitely logistical regression and the various clustering algorithms will get GPU support, plus I would imagine linear regression as well as I can't see how I'm not going to need that. From there, I'm just not sure right now.
Thinking about how to implement this,I think the best way without introducing breaking changes is obviously add "gpu" feature. Then I'll simply add "set_gpu_threshold(&mut self, threshold: usize)" functions into various places. It will default to something like 250000 or so, then others can set it as desired.
With logistical regression for example, when running fit() function if number of rows is below threshold will remain on CPU, and if above will transfer matrix over to GPU and run operations there. That will suit my needs, and I think it the cleanest and most simplistic way to implement this without introducing any changes to existing API.
Does that work with you? Let me know if you would prefer something different.
If you're wondering, smartcore is helping develop and train the best damn open source NLU engine on the planet, baby! https://cicero./sh/sophia/
Trained the new POS tagger yesterday which involved 47,000 logistical regression models, one per-ambiguous word. Now about to enhance the NLU engine with advanced contextual awareness hence computational needs increase greatly, and is why I need GPU support. Dislike Python, love Rust, smartcore is simply awesome, and want to stay on this pipeline, so adding GPU support to smartcore makes sense for me. Great that you're interested, so will go ahead and add necessary comments, documentation, tests, etc. and polish into PR for your review. From there, you can decide whether you want this is main smartcore crate or branch off into seperate GPU crate, but it shouldn't introduce any breaking changes to API so I don't see the harm keeping it in this crate. We'll see what happens though.
I think I'm good to go here, but if anything I said rubbed you the wrong way, let me know and I'll pivot as necessary. Otherwise, will be back shortly with a polished and published PR.
Thanks for bringing smartcore into the world. This crate is beautiful.
Cheers, Matt
Hi Lorenzo,
Was a little busy with summer holidays and family get togethers, but back at the grind now. Been working hard at transparent GPU acceleration layer, but will pivot off this for a few days, so decided to just throw this out for now: https://github.com/cicero-ai/smartcore/commit/1b3ce9c
Essentially everything is self contained to /src/gpu/ directory, aside from new error struct at /gpu/error/gpu_error.rs and couple other minor changes.
As with everything in life, turned into a larger project than initially expected. Still lots to do, but it's taking form and I'm almost getting a smile on my face now. Lots of tiny design and architecture hurdles keep popping up, but it's getting there.
Keeping modifications to existing code at an absolute minimum, and once done will just be a nice transparent GPU acceleration layer. You'll be able to tell your users, "if you have a decent machine with GPU, go ahead and upgrade, enable new 'gpu' feature, run your code again with zero changes and get a nice performance boost".
Everything aggressively cached and lazy loaded via that GpuStation struct as is the norm in quality ML packages, and all designed in a very non-intrusive way. There will be a few lines added here and there to those fit() functions to check if the algorithm supports GPU and if the matrix meets the # rows / samples threshold, but that's it. There will be no other changes to your code. If the matrix is over the threshold (default to ~80k, but developer defined) then processing will pass to GPU, and otherwise it will stay on CPU.
Will be no LBFGS support as not amenable to GPU as too sequential. instead, will add grad descent to logistic regression, which I still need to integrate. The wgsl code is from LLMs and needs to be vetted as I'm new to wgsl, but the rest of the code is mine. I haven't run a single thing on GPU yet, as have been busy with getting architecture in place, but believe this framework is a very solid step in the right direction.
It will be nice once it's done, but I need to pivot off this for a week or two and concentrate on my own work again. That, and you know those times where you know the best thing for the project is to put it down and come back with a clear mind? I'm there right now, hence don't worry about the mess, I'll clean that.
Just wanted to throw this out there for now. Hang tight, will swing back around to this in the near future and get it done.
If you have time, please take a look and let me know your thoughts.
Cheers, Matt
PS. Yes, I'm aware I haven't run "cargo fmt" yet. I went blind years ago, hence like long lines as more accessible via screen reader, will run fmt later.
Thank you, I am taking a look. Just for historical correctness: the library was created by Vlad Orlov, I became maintainer in 2019.
You can open a PR and target development branch. It is easier to work in there.
Sounds good. Yeah, nowhere close to PR just yet.
Let me pivot away from this for a couple weeks, come back to it and give it another week of work, then should be ready for a PR.
Matt
OK, as you feel comfortable. If you open a PR I can help with the doctrings and the the tests in the meanwhile.
EDIT: also this way we can tackle design choices early in the process, for example the usage of WGSL code strings or if we can reuse existing traits for GpuMatrix