kevinpro comments

Results 22 comments of


                                            kevinpro

Deepspeed ZeRO2 + Trainer does not resume training after evaluation

> @Ricardokevins Oh nice that you fixed it! Can I ask for some advice since I'm still facing the issue: > > * What do you mean by "replaced the...

Deepspeed ZeRO2 + Trainer does not resume training after evaluation

> Thanks @Ricardokevins ! Btw I've fixed the issue by setting number of threads used for intraop parallelism to 1 > > ``` > torch.set_num_threads(1) > ``` > > This...

使用完sft_packing微调之后出现模型自主多轮回答的现象

听起来像packing之间没有eos token

How to deploy in VLLM?

Hi, thank you for your great work! I would like to know how many V-RAM needed? I try with 8*40G, but failed with OOM.

How to deploy in VLLM?

> > Hi, thank you for your great work! I would like to know how many V-RAM needed? I try with 8*40G, but failed with OOM. > > 8x80G，8*40G only...

请问中文表现和chatglm6b相比哪个好

确实对效果挺好奇的。。。评测应该很快就有人做了

请问中文表现和chatglm6b相比哪个好

> 我看来看去，哪里写了16B？看不懂readme的中文啊

sft_packing实现的问题

any update on this issue? @hiyouga

enable_ema cause runtime error when running train_ppo_llama.sh

Same issue here with Zero-3 Training So what should i do to solve the issue? @hijkzzz

SiLU FFN

https://github.com/facebookresearch/llama/blob/6796a91789335a31c8309003339fe44e2fd345c2/llama/model.py#L348 ``` def forward(self, x): return self.w2(F.silu(self.w1(x)) * self.w3(x)) ```