Nan Zhang

Results 6 comments of Nan Zhang

> 你好,我建议不要使用`GatheredParameters`,而是用`torch.mean(param.ds_tensor)`然后自己gather 你好啊!想请教一下怎么自己gather呢?我看ds_tensor里面貌似是打乱过的parameters。这里我还是想知道某个参数具体在llama里面的位置的(比如在mlp/self_attention的哪个layer)。谢谢!

Sorry for the late response! 我用了推荐的办法,gather好了param.ds_tensor。可是在做backward的时候,还是遇到了跟最开始一样的问题(lomo.py里面grad_norm()的`loss.backward(retain_graph=True)`产生`RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1`的)。 我猜还是deepspeed在做backward的时候,找不到这些ds_tensor。不知道我理解的是否正确或者有什么解决办法吗?谢谢!

> Does that mean that you want to use those weights that have already been pruned? Yes. Just want to locate those pruned weights in the original llama architecture (e.g.,...

> You can get the position of pruned parameters by replacing [Line 150](https://github.com/horseee/LLM-Pruner/blob/1455fa9646bc8b87ccbc613cf1b97e5729e06152/hf_prune.py#L150) in hf_prune.py by: > > ``` > for group in pruner.step(interactive=True): > print(group.details()) > group.prune() > ```...

> Not exactly. There's just a minor detail that needs to be corrected. > > Let's take this example: the down_proj Linear layer has in_features=11008 and out_features=4096, which in PyTorch,...

I tried to parse the string generated by `group.details()`. As a sanity check, I calculated the total number of pruned weights via `group.details()` and double checked the actual number of...