ZhiYuanZeng
ZhiYuanZeng
It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group...
Could we share the mask of query-heads among different groups? > Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning...
Yes, the rope is parameter-free, but the base of rope is often tuned to support long-context extrapolation. The base of ComposerMosaicLlama is fixed to be 10000. This configuration works well...
But It is better to set the rope base from the config file, rather than loading from checkpoint. > I found that the rope params are ignored in composer_to_hf.py and...