cywjava comments

Results 91 comments of


                                            cywjava

如何使用模型并行训练

https://github.com/chenyiwan/chatglm-6b-fine-tuning 看这里

[BUG/Help] <采用P-tuning 中的Web_demo为何微调后，丧失了原有的对话功能>

我觉得是优化器和学习率的问题。

[BUG/Help] readme文档里为啥没有多卡ptuning的教程呢，或者解释一句会有什么情况吧，这个问题困扰好多人啊

这个只能自己研究了。

[Feature] 建了个分支，支持多GPU部署，自动平均分配显存。

> 我也遇到了同样的报错： Expected all tensors to be on the same device, but found at least two devices... > ## 训练采用2张、四张都可以。我已经解决了，图，是使用GPU0 和 1 训练， 7做生成文本。 ![image](https://user-images.githubusercontent.com/56297473/229975256-4f063493-5543-43a5-a8e7-d72c19a26fb0.png) 使用4卡训练，也没有问题 ![image](https://user-images.githubusercontent.com/56297473/229976405-6f4db387-81d0-47e2-8487-93cb2208e7c9.png) 直达===> https://github.com/chenyiwan/chatglm-6b-fine-tuning

[Feature] 建了个分支，支持多GPU部署，自动平均分配显存。

> > @xiaoweiweixiao 微调不行，你在一张卡上微调完，在部署到多卡。或者你自己写训练代码，中间把对Tensor操作时都放到同一device再操作 > > 哦哦，原来这只能用在部署上呀，”中间把对Tensor操作时都放到同一device再操作“这个能降低对单卡显存的要求吗？大佬考不考虑出个分布式训练的代码呀 O.O 不是的，训练也可以多卡的啊

ptuning评估的时候报错：The expanded size of the tensor (140) must match the existing size (312) at non-singleton dimension 0. Target sizes: [140]. Tensor sizes: [312]

你不要执行model.eval()

cywjava

如何使用模型并行训练

关于基于 ChatGLM-6B做增量预训练

[BUG/Help] <采用P-tuning 中的Web_demo为何微调后，丧失了原有的对话功能>

[BUG/Help] readme文档里为啥没有多卡ptuning的教程呢，或者解释一句会有什么情况吧，这个问题困扰好多人啊

[Feature] 建了个分支，支持多GPU部署，自动平均分配显存。

[Feature] 建了个分支，支持多GPU部署，自动平均分配显存。

ptuning评估的时候报错：The expanded size of the tensor (140) must match the existing size (312) at non-singleton dimension 0. Target sizes: [140]. Tensor sizes: [312]

Cannot load the checkpoint

train data的格式

是本身就存在的train.py 里的bug吗？写入正确的数据path，成功导入数据后仍然会报的valueerror问题到底是哪一步错了？