Zhengxiao Du comments

Results 163 comments of


                                            Zhengxiao Du

微调报错 AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute '_restore_from_fp16_weights'

这个问题应该是新版本的DeepSpeed将`_restore_from_fp16_weights`重命名为了`_restore_from_bit16_weights`，但同时在DeepSpeedEngine中没有修改。一种解决方法是使用旧版本的DeepSpeed（我们使用的是0.3.16），另一种是修改deepspeed/runtime/engine.py，将`self.optimizer._restore_from_fp16_weights()`这行修改为`self.optimizer._restore_from_bit16_weights()`

微调loss特别大

应该是{"prompt": "", text: 小说文本}。prompt表示的是不需要生成的部分，如果你是想生成小说的话可以从头生成没有prompt。按512长度分段是程序自动完成的，不需要在数据里面完成

The multi-task learning setting is different from the original paper

The two ways of multi-task learning are designed for ablation study of different objectives. To enable adaption to different downstream tasks, we mix all the three types of objectives in...

配置问题

These are arguments of the DeepSpeed launcher. `NUM_WORKERS` is used to set `--num_nodes`, which means the number of servers used for pretraining `NUM_GPUS_PER_WORKER` is used to set `--num_gpus`, which means...

配置问题

You need to divide the downloaded checkpoint with `change_mp.py`, following the instruction in https://github.com/THUDM/GLM#model-parallelism

Google Colab error

You cannot install apex with `pip install apex` because there is another package apex that has nothing to do with deep learning in PyPI, which is why we didn't list...

HuggingFace module

> Any update on this? I notice that there is a out of box pretrain version for GLM-10B. Would like to know whether there are any future plan on uploading...

[BUG/Help] <title>量化失败？

你手动运行一下 `ctypes.cdll.LoadLibrary("C:\Users\oo.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels.so")` 看一下会报什么错？

[BUG/Help] 运行bash ds_train_finetune.sh报错

可以 pull 一下最新的仓库代码

[Feature] 先使用CPU进行模型量化，再将量化后模型拷贝至GPU

这个需求可以直接用量化后的模型 https://huggingface.co/THUDM/chatglm-6b-int4 和 https://huggingface.co/THUDM/chatglm-6b-int8 不过量化之后在 GPU 上推理也是需要用 CUDA写的 kernel的，我觉得可能无法成功。要解决这个问题还是要把 CUDA kernel 移植到 ROCm