KyrieMing

Results 18 comments of KyrieMing

PVP 是啥意思啊

感觉也没必要在模型里面加上pooler,这个只不过是苏神在做那个sim实验的时候加了个pooler的向量化策略,在转换文件里面加个条件过滤更方便

Hello, I recently read a collections of instruction tuning papers like Flan, T0, NIV2. And I agree with your idea that a pre-trained model should first instruction tuning with open...

> > Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation. > > > > > > >...

> Hi, have you solved the problem? I meet the same problem. If using megatron, seems like you must load zero optimizer. Keep MP size same, gpu num could be...

> I transformed the parameters to huggingface without ds-zero-states, it works well. Why does gpt-neox must load zero-states?

> Can you explain why this isn’t the desired behavior? Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox 2.0 with 256...

> > Can you explain why this isn’t the desired behavior? > > Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox...