smartcore GPU backend with Emu

Hi,

This looks neat! I was curious about how hard it would be to implement a GPU-accelerated backend with Emu. Would it amount to implementing the API in linalg?

Sep 30 '20 15:09 calebwin

Hi @calebwin, glad to get in touch and thanks for looking into SmartCore!

In theory yes, but in practice it depends on how well methods from BaseMatrix and BaseMatrix map to the idea of parallelizing your logic and type constraints imposed by Emu.

Also, there is a question of a trade-off between gain in performance you get by shifting computation to GPU and the decrease, caused by copying the arrays to the GPU. From little I know about GPUs it seems to me that targeting complex methods that can be easily parallelized and that span multiple CPU cycles we have a better chance of improving performance of the library then if we implement every single method from BaseMatrix and BaseMatrix. Is this correct? If yes, take a look at matrix decomposition routines that I've copied from Numerical Recipes book:

https://docs.rs/smartcore/0.1.0/smartcore/linalg/evd/index.html
https://docs.rs/smartcore/0.1.0/smartcore/linalg/lu/index.html
https://docs.rs/smartcore/0.1.0/smartcore/linalg/qr/index.html
https://docs.rs/smartcore/0.1.0/smartcore/linalg/svd/index.html

For example, if you manage to improve performance of SVD and QR decomposition you automatically improve performance of linear regression and PCA methods.

In any case I like your idea. Feel free to experiment and let me know if you ran into any problems cause by API structure and method signatures. In the worst case we can always change methods from module linalg if needed.

Sep 30 '20 18:09 VolodymyrOrlov

In theory yes, but in practice it depends on how well methods from BaseMatrix and BaseMatrix map to the idea of parallelizing your logic and type constraints imposed by Emu.

I see, functions like matmul map easily but if functions like get are used heavily (and it looks like they are) then the API doesn't work so well.

Also, there is a question of a trade-off between gain in performance you get by shifting computation to GPU and the decrease, caused by copying the arrays to the GPU. From little I know about GPUs it seems to me that targeting complex methods that can be easily parallelized and that span multiple CPU cycles we have a better chance of improving performance of the library then if we implement every single method from BaseMatrix and BaseMatrix. Is this correct? If yes, take a look at matrix decomposition routines that I've copied from Numerical Recipes book:

This is a valid concern but I think having DeviceDenseMatrix, etc. that contain a DeviceBox<[f32]> would allow data to persist on the GPU, eliminating unnecessary transfers.

For example, if you manage to improve performance of SVD and QR decomposition you automatically improve performance of linear regression and PCA methods.

Right, I think this would be the way to implement a GPU backend - rewrite algorithms so they are either specialized for Emu or utilize common functions in BaseMatrix that have GPU-accelerated versions of them.

Sep 30 '20 18:09 calebwin

I also want to add that we can iteratively improve performance of the library by reducing reliance on methods like get and set once the initial integration with Emu is done. I was going to do it anyway after initial "expansion" phase of the project where I try to cover as many ML methods as I can in a short period of time. During this phase I am not focusing on low-level optimization while simultaneously not closing the door to improving it in the future by making bad architecture decisions. But I am open to start moving code around to improve performance now if you are willing help me with that.

Sep 30 '20 18:09 VolodymyrOrlov

can the AI developed in smartcore learn to code this backend alone ???

Oct 21 '20 08:10 malv-c

smartcore smartcore copied to clipboard

GPU backend with Emu

smartcore
smartcore copied to clipboard