KyrieMing comments

Results 18 comments of


                                            KyrieMing

P-tuuning的一些问题

PVP 是啥意思啊

tf转torch model时要过滤掉pooler层

感觉也没必要在模型里面加上pooler，这个只不过是苏神在做那个sim实验的时候加了个pooler的向量化策略，在转换文件里面加个条件过滤更方便

[BUG] Get "exits with return code = -9" when Creating fp16 ZeRO stage 2 optimizer

> I solve this by setting tensor parallel = 2

Supervised data

Hello, I recently read a collections of instruction tuning papers like Flan, T0, NIV2. And I agree with your idea that a pre-trained model should first instruction tuning with open...

Problems on generating with llama model

> > Thanks DaoD. This project has already converted the ckpt into Megatron/GPT-NeoX format. I'm curious about how you used HF for validation. > > > > > > >...

Finetuning loss explode when not loading deepspeed zero optimal states

@StellaAthena

Finetuning loss explode when not loading deepspeed zero optimal states

> Hi, have you solved the problem? I meet the same problem. If using megatron, seems like you must load zero optimizer. Keep MP size same, gpu num could be...

Finetuning loss explode when not loading deepspeed zero optimal states

> I transformed the parameters to huggingface without ds-zero-states, it works well. Why does gpt-neox must load zero-states?

Finetuning loss explode when not loading deepspeed zero optimal states

> Can you explain why this isn’t the desired behavior? Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox 2.0 with 256...

Finetuning loss explode when not loading deepspeed zero optimal states

> > Can you explain why this isn’t the desired behavior? > > Does GPT-Neox 2.0 not support finetune model using different gpu nums? I pretrain 6B model using GPT-Neox...