Ruoyu Qin
Results
1
issues of
Ruoyu Qin
Hello! The current method for model loading is quite fixed, regardless of the tensor parallel size. It involves each rank in a tp group reading the full weight file, and...