hoshi-hiyouga comments

Results 294 comments of


hoshi-hiyouga

Why mess with the order of tags when using DualCL？

Thanks for your question. We perform random shuffling on the labels to mitigate the bias brought by the position embeddings in the BERT models. In other words, we make the...

Why mess with the order of tags when using DualCL？

Thanks very much! Exactly it was problematic, we have removed the random shuffling and assigned all the position embeddings of the label tokens as zero. Therefore, the model's prediction is...

这是目前看到最全的大模型训练代码

目前的模型训练支持多轮对话，需要在 dataset_info.json 中指定 history 列。在多轮对话的训练中，目前普遍采用的方式是 ``` q1 + a1 + q2 + a2 + q3 + a3 [IGNORE] + [IGNORE] + [IGNORE] + [IGNORE] + [IGNORE] + a3 ```...

这是目前看到最全的大模型训练代码

这可能会破坏掉 BOS 和 EOS 的语义信息，我们不推荐这么做。

这是目前看到最全的大模型训练代码

抱歉，我的说法可能有误，我重新参考了 [Vicuna 的训练代码](https://github.com/lm-sys/FastChat/blob/e365af782e2f99dd674d021087f8ecfa3840adff/fastchat/train/train.py#L77)，这种方式的确能加速模型在多轮对话上的训练，我们考虑在近期实现类似的功能，感谢你的建议！

这是目前看到最全的大模型训练代码

在最新的代码 b6faf0207d5b637722a1fd45984d27b3ac095fd4 中，我们实现了多轮对话语料的训练。另外，我们暂时不会考虑加入 RWKV 的微调。

NameError: name 'awq_ext' is not defined

卸载 autoawq 并从源码重新安装 pip install git+https://github.com/casper-hansen/AutoAWQ.git

NameError: name 'awq_ext' is not defined

这个也安装一下 https://github.com/casper-hansen/AutoAWQ_kernels#requirements

PPO训练报错Tensors must be CUDA and denseTensors must be CUDA and dense

在 load_pretrained 的 AutoModel 加载后打印一下 Tensor 的 dtype 和 device 试试？

chatglm和chatglm2微调时的区别

我们对 v2 的特殊处理如下：https://github.com/hiyouga/ChatGLM-Efficient-Tuning/blob/3a53dd404bec2831d93fd55f516f604f7993dfbb/src/utils/common.py#L228-L232