ZhiYuanZeng comments

Results 4 comments of


                                            ZhiYuanZeng

The Project is not implemented for 70B llama?

It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group...

The Project is not implemented for 70B llama?

Could we share the mask of query-heads among different groups? > Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning...

Why the rope params are ignored while converting hf checkpoint to composer checkpoint?

Yes, the rope is parameter-free, but the base of rope is often tuned to support long-context extrapolation. The base of ComposerMosaicLlama is fixed to be 10000. This configuration works well...

Why the rope params are ignored while converting hf checkpoint to composer checkpoint?

But It is better to set the rope base from the config file, rather than loading from checkpoint. > I found that the rope params are ignored in composer_to_hf.py and...