KyrieMing
KyrieMing
PVP 是啥意思啊
感觉也没必要在模型里面加上pooler,这个只不过是苏神在做那个sim实验的时候加了个pooler的向量化策略,在转换文件里面加个条件过滤更方便
> I solve this by setting tensor parallel = 2
Hello, I recently read a collections of instruction tuning papers like Flan, T0, NIV2. And I agree with your idea that a pre-trained model should first instruction tuning with open...
> > Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation. > > > > > > >...
> Hi, have you solved the problem? I meet the same problem. If using megatron, seems like you must load zero optimizer. Keep MP size same, gpu num could be...
> I transformed the parameters to huggingface without ds-zero-states, it works well. Why does gpt-neox must load zero-states?
> Can you explain why this isn’t the desired behavior? Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox 2.0 with 256...
> > Can you explain why this isn’t the desired behavior? > > Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox...