Jiaxin Shan

Results 193 comments of Jiaxin Shan

Is there a way to support distributed training? We do not have that many 80G cards.

@davidearlyoung Thanks for all the details. Do you know whether distributed inference works or not? We have some A100-40G cards and do not like to sacrifice use quantization. We are...

@davidearlyoung I really appreciate your informative analysis! Thanks a lot! > I personally do not know if distributed inference works for grok-1 in pytorch. Yes! that's my questions to the...