xianghuisun comments

Results 68 comments of


xianghuisun

RuntimeError: Error(s) in loading state_dict for BloomForCausalLM: 这个是内存不足还是torch版本不同

应该和内存没有关系请问您运行的是哪个代码？

使用量化后的模型执行finetune.py文件出现“RuntimeError: expected scalar type Float but found Half”？？

可参考： https://github.com/LianjiaTech/BELLE/issues/91 https://github.com/LianjiaTech/BELLE/issues/122

使用量化后的模型执行finetune.py文件出现“RuntimeError: expected scalar type Float but found Half”？？

> > > > 谢谢您的答复，我还有一个问题是能使用量化后的模型继续做finetune么？？？量化后的模型理论上是可以继续finetune的。不过目前我们并没有做相关实验

显存不够：CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

> CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved in total by PyTorch)...

显存不够：CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

> > > CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved in total...

显存不够：CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

> > > > > CUDA out of memory. Tried to allocate 6.28 GiB (GPU 1; 39.45 GiB total capacity; 31.41 GiB already allocated; 5.99 GiB free; 31.42 GiB reserved...

请问多大的显存可以微调 BELLE-7B-2M 模型

> _No description provided._ 详见train/FAQ.md 如果是全量参数微调，需要8张A100 40G 如果是LoRA，单张A100 40G

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0

> 如果把logging_steps改为10以上呢？

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0

> igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 } > > deepspeed...

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0

我们会找时间尝试一下，看看能不能复现这个问题。 ------------------ 原始邮件 ------------------ 发件人: "LianjiaTech/BELLE" ***@***.***>; 发送时间: 2023年4月9日(星期天) 晚上7:08 ***@***.***>; ***@***.******@***.***>; 主题: Re: [LianjiaTech/BELLE] 出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 (Issue #134) igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom",...