amulil
amulil
> Please complete the git commit id so that we can reproduce it. @pppppM ```python commit 6892d65ab93184024c561a4fdc0d5653a8f77299 Author: Zhihao Lin Date: Mon Nov 20 14:46:45 2023 +0800 [Fix] Fix bugs...
更新了 dpo 的实现,使用 sft 的数据,可以跑通流程,但是存在两个问题: `NPROC_PER_NODE=8 xtuner train internlm2_chat_1_8b_qlora_dpo_ultra_e3 --deepspeed deepspeed_zero2` 1. loss 为 nan 2. deepcopy 的方式不支持量化加载,只有 lora 和不量化加载,流程可以跑通 @xiaohangguo @pppppM 佬们,看下这两个问题是为啥呀
> ref_model 要不直接用 llm 的 config 重新 build ? > > loss 为 nan 可能要 @xiaohangguo 帮忙看下公式细节 可以 我试试改成 用 llm 的 config 重新 build
`NPROC_PER_NODE=8 xtuner train internlm2_chat_1_8b_full_dpo_ultra_e3 --deepspeed deepspeed_zero2` 目前 full dpo loss 正常了: 接下来按照 trl 文档里的说明添加 qlora dpo: https://moon-ci-docs.huggingface.co/docs/trl/pr_1193/en/dpo_trainer#downsides-to-merging-qlora-before-dpo-approach-2
> @amulil 请问现在有DPO训练的模型指标对比吗?我想参考这个实现[RLHF-V](https://arxiv.org/abs/2312.00849) code: https://github.com/RLHF-V/RLHF-V, https://github.com/thunlp/Muffin @KooSung 目前暂时没有,后面会参考 https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/README.md 提到的 [zephyr-7b-dpo-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora) 模型来看指标对比。
@HIT-cwh I use this config, just set batch_size=4. https://github.com/InternLM/xtuner/blob/193f614ffbb2463010808ebb2e689331a9c5e4f6/xtuner/configs/qwen/qwen1_5/qwen1_5_0_5b_chat/qwen1_5_0_5b_chat_qlora_alpaca_e3.py#L40C8-L40C8 Then I use the command `CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3` to train. Thanks for your tip, I didn't install flash-attn. After...
Does this mean that I can't turn off KV cache now? Turning on KV cache will cause the model to use historical data to generate answers every time. I don't...
The use_cache flag can control the opening and closing of KV cache when loading the model in Hugging Face style. This configuration item is currently unavailable in vllm.
> And please note that enabling KV cache never affects your model outputs. I tested the hugging face model with use_cache. `use_cache=true` causes the output to be the same if...
> It seems that 40GB memory is not enough for 70B-QLoRA, even with `deepspeed_zero2_offload`. > > You can also try to reduce the length of the each samples by setting...