Bird.Z

Results 5 comments of Bird.Z

Nan loss will results in updating failure, please check my modifications.

> BELLE-7B(bloom)量化后,推理速度显著降低。 BELLE-7B(LLaMA)量化后,推理速度也下降了一部分。 > > 代码: > > ``` > import time > > import torch > import torch.nn as nn > > from gptq import * > from modelutils...

> It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each...

> Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13 Yes, your settings are right. We...

The rope is infact not trained and is fixed registerd buffer tensors. It is ok to apply the default settings of ROPE without any modifications.