Gu Wei

Results 9 comments of Gu Wei

why should class LlamaRMSNorm do ”hidden_states = hidden_states.to(torch.float32)“ ,why not flow the type promotion rules of PyToch ops

self.weight is bf16,hidden_states is fp32 I found that the dtype of these two methods are different. method 1: return (self.weight * hidden_states).to(input_dtype) # (bf16 * fp32).to(input_dtype) method 2: return self.weight...

Is this a bug or a problem with incorrect usage? I didn't find any relevant instructions in the community documentation.

https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L3737 ``// This means there is not yet a NCCL collective being called // Here we have to use the best guesses and will use a single GPU to call...

@mal @ezyang This minimum case is a real scene constructed from the training model. Does the comment in the code mean that it is used incorrectly?

> The reason for the hang is complicated and yes, it is related to the code you refer to (guessing device). > > There are two ways to workaround it:...