Frank Qi comments

Results 11 comments of


                                            Frank Qi

换了新的模型和配置文件后，不生成内容呢

> You have to downgrade your version of the peft package to 0.2.0 for now. The new version is broken. See #1253 I'm facing the same issue. Just notice that...

换了新的模型和配置文件后，不生成内容呢

> yes. I have encountered this problem for TWICE. Each time I didn't do any other changes at all, just installed peft==0.2.0 from 0.3.0.dev. And for that reason, I have...

modules_to_save: "ValueError: Attempting to unscale FP16 gradients"

yes, for now, if use a lower version of peft, it will report the same error as you. But if upgrade peft to latest version(0.3.0.dev0 for now), I can't save...

modules_to_save: "ValueError: Attempting to unscale FP16 gradients"

> New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly. > > To avoid "ValueError: Attempting to unscale FP16 gradients",...

[ERROR] [launch.py:324:sigkill_handler] exits with return code = -9

my config: model size: 7b datasets: 10k rows of instructions device: 8xa100-80g, with 50 cores of cpu and 500 gb of ram. Also KILLED. Damn...so sad

[ERROR] [launch.py:324:sigkill_handler] exits with return code = -9

> > > my config: model size: 7b datasets: 10k rows of instructions device: 8xa100-80g, with 50 cores of cpu and 500 gb of ram. > > Also KILLED. Damn...so...

合并权重后加载权重极慢

I found that it will be slow if first load. But for me, it will finish within minutes. Or you can check if your cuda environment works as normal. BTW,...

合并权重后加载权重极慢

> > I found that it will be slow if first load. But for me, it will finish within minutes. Or you can check if your cuda environment works as...

合并权重后加载权重极慢

> Sorry to bother. The transformers will load model in float32 in default. Users have to set load type when loading or do half() to obtain a float16 model(in this...

num_beam=2, RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

same problem. My generation config: do_sample=True, beams>1(using beam_sample) It turns to be normal when tweak beams=1(using sample). Didn't try if tweak do_sample to False, but it seems working in others'...