metal-rs
metal-rs copied to clipboard
Matrix multiplication support
Edit: outdated, see https://github.com/ivarflakstad/metal-rs for current state.
Leaving open for visibility.
Hey! Gave this a go.
Output from running cargo run --release --example mps-matrix-multiplication --features=mps
Correctness:
[=================================================] ✅
Performance:
Generating input matrices: (f32 4096x4096 and f16 4096x4096)
Running with transpose left: false, transpose right: false, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2714.017354808345
Total time: 2.534734497s
Avg time: 50.694689ms
Running with transpose left: true, transpose right: false, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2544.2273256244875
Total time: 2.702094294s
Avg time: 54.041885ms
Running with transpose left: false, transpose right: true, alpha: 1, beta: 0
[=================================================] ✅
Avg GFLOPS: 2696.778482443825
Total time: 2.550065042s
Avg time: 51.0013ms
Running with transpose left: false, transpose right: false, alpha: 0.5, beta: 0
[=================================================] ✅
Avg GFLOPS: 2668.0019304379844
Total time: 2.577410417s
Avg time: 51.548208ms
Running with transpose left: false, transpose right: false, alpha: 1, beta: 0.5
[=================================================] ✅
Avg GFLOPS: 2667.1376704607605
Total time: 2.578379166s
Avg time: 51.567583ms
@cwfitzgerald (Sorry for the ping if you're not the correct person).
We're iterating quite a bit on this here -> https://github.com/ivarflakstad/metal-rs We would love some feedback if upstreaming here makes sense or not.
I'm not really involved in metal-rs development outside of administration and releases, so not really sure what fits and what doesn't :)
Fair enough 😊 Maybe you know who does?
@kvark Do you know who might be involved with metal-rs?
@Wumpf Maybe ?
After having matured a bit this and implemented the Matrix
in https://github.com/huggingface/candle/ I feel like this PR should be trimmed down to only get the MPSMatrix
support and remove the rest of the sugar (being more aligned with the rest of the crate).
Ofc having maintainer advice would be helpful.
Also in order to be able to publish on our side, we're likely going to fork metal-rs
for the time being to have access to those objects.
We might return to metal-rs
once we get rid of MPSMatrix
dependency (if we can using metal-flash-attention, which seems faster, but currently isn't as widely portable as MPSMatrix since it doesn't work on first gen m1).
Either that or when this PR gets included.