Gu Wei comments

Results 9 comments of


                                            Gu Wei

Direct Installation command

LLaMA-Vicuna-13B 和 Baichuan-Vicuna-7B 的对比评测数据（由GPT4打分，供大家参考）

mark

LlamaRMSNorm() Dtype Casting Error

why should class LlamaRMSNorm do ”hidden_states = hidden_states.to(torch.float32)“ ，why not flow the type promotion rules of PyToch ops

LlamaRMSNorm() Dtype Casting Error

self.weight is bf16，hidden_states is fp32 I found that the dtype of these two methods are different. method 1: return (self.weight * hidden_states).to(input_dtype) # (bf16 * fp32).to(input_dtype) method 2: return self.weight...

[BUG] Cannot free parameter with ZeRO3 + offload parameter in Pytorch1.9

I had the same problem and was very confused

`torch.distributed` hangs when using `torch.distributed.barrier` before any other communication primitives.

Is this a bug or a problem with incorrect usage? I didn't find any relevant instructions in the community documentation.

`torch.distributed` hangs when using `torch.distributed.barrier` before any other communication primitives.

https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L3737 ``// This means there is not yet a NCCL collective being called // Here we have to use the best guesses and will use a single GPU to call...

`torch.distributed` hangs when using `torch.distributed.barrier` before any other communication primitives.

@mal @ezyang This minimum case is a real scene constructed from the training model. Does the comment in the code mean that it is used incorrectly?

`torch.distributed` hangs when using `torch.distributed.barrier` before any other communication primitives.

> The reason for the hang is complicated and yes, it is related to the code you refer to (guessing device). > > There are two ways to workaround it:...