duan
duan
Until 2024, the docs still not update yet...
Thank you for your response. I have confirmed the presence of a straggler issue. As illustrated in the attached images, the first GPU remains idle, waiting for the second GPU...
> @duanzhaol I don't think you're using compilation are you? Yes, I haven't use compile in my process. Is compile a necessary step for tensor parallel? I think it should...
I opted not to use compilation because my objective is to use tensor parallelism on a serverless platform. The initial compilation process is significantly time-consuming, which becomes impractical in our...
> @duanzhaol Out of curiosity, what level of overhead is acceptable? Maybe less than a second? In serverless if the function is pure stateless, every request need to recompile the...
Thank you for the detailed response. After modifying the warm-up code, the load time has significantly improved. My machine configuration connects every two GPUs on a single PCIe link, and...