mistral.rs
mistral.rs copied to clipboard
Need parallel linears
- [ ] RowParallelLinear
- [ ] MergedColumnParallelLinear
- [ ] QKVParallelLinear
Is the plan to implement tensor sharding for both quantized and non-quantized versions?
Yes, that is the plan.
I'm quite interested in the parallelization in CUDA of the quantized models!
Yes! We are beginning work on this topic now.