lhtpluto comments

Results 39 comments of


                                            lhtpluto

OutOfMemoryError: CUDA out of memory.

> > 修改 sft.yaml，deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 > > 把deepspeed offload到CPU上 > > 问题解决 > > 硬件环境：RTX 6000 ADA...

OutOfMemoryError: CUDA out of memory.

> 请问推理的时候出现这样的错怎么改呀 moss_cli_demo.py RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in...

微调时报错，FileNotFoundError: No such file or directory: './sft_data/train.jsonl'

仅仅提问的微调，怎么能让MOSS回答出我希望的答案啊？ "user_prompt" 是提问的标签，那回答的标签是谁那么啊？另，用程序中提供的jsonl无法进行微调，报错 KeyError: 'chat' 执行#282里的脚本后，能正常微调

微调时报错，FileNotFoundError: No such file or directory: './sft_data/train.jsonl'

> @lhtpluto 282 脚本是什么东西？里面有个.py程序，执行后可以生成sft的训练数据

微调时报错，FileNotFoundError: No such file or directory: './sft_data/train.jsonl'

> > > @lhtpluto 282 脚本是什么东西？ > > > > > > 里面有个.py程序，执行后可以生成sft的训练数据 > > @lhtpluto > > https://github.com/OpenLMLab/MOSS/blob/4ab9c7874f5251135ccc19b5b2e1470c6b53a628/finetune_moss.py#L282 > > 啊？哪个文件 https://github.com/OpenLMLab/MOSS/issues/282

在8张40g A100上运行微调代码，bsz=1，报显存不够错误，请问最低训练硬件条件是什么？

> 好的，我把deepspeed里面offload到cpu就可以了非常感谢把deepspeed offload到cpu 可行

在8张40g A100上运行微调代码，bsz=1，报显存不够错误，请问最低训练硬件条件是什么？

> > 这个配置跑最大长度为2048应该是跑不起来的。可以试试缩短输入长度。 > > 请问finetune的时候想修改成1024，要如何操作？尝试在finetune_moss.py中找到以下语句，看到2048改成1024 if len(input_ids + cur_turn_ids) > 2048: break input_ids.extend(cur_turn_ids) no_loss_spans.extend(cur_no_loss_spans) if len(input_ids) == len(instruction_ids): continue assert len(input_ids) > 0 and len(input_ids)

请教 RuntimeError: `<class 'models.quantization.QuantLinear'>' was not properly set up for sharding by zero.Init(). A subclass of torch.nn.Module must be defined before zero.Init() where an instance of the class is created.

我也遇到了这个问题，想训练量化模型但报错 RuntimeError: `' was not properly set up for sharding by zero.Init(). A subclass of torch.nn.Module must be defined before zero.Init() where an instance of the class is created.

请教 RuntimeError: `<class 'models.quantization.QuantLinear'>' was not properly set up for sharding by zero.Init(). A subclass of torch.nn.Module must be defined before zero.Init() where an instance of the class is created.

> 找不到 train.jsonl 参考#282 https://github.com/OpenLMLab/MOSS/issues/282

采用微调代码训练后得到的pytorch_model.bin达到了62g，请问有什么办法拆分成多个文件并且满足推理代码的调用格式？

> _No description provided._ 为啥我用官方程序生成的是 .pt 文件？我的训练文件使用https://github.com/OpenLMLab/MOSS/issues/282 的方法生成没有使用官方的2.5GB的数据