Ruoyu Qin

Results 1 issues of Ruoyu Qin

Hello! The current method for model loading is quite fixed, regardless of the tensor parallel size. It involves each rank in a tp group reading the full weight file, and...