amulil comments

Results 14 comments of


                                            amulil

[bug] The datasets can't load successfully when using two nodes in slurm.

> Please complete the git commit id so that we can reproduce it. @pppppM ```python commit 6892d65ab93184024c561a4fdc0d5653a8f77299 Author: Zhihao Lin Date: Mon Nov 20 14:46:45 2023 +0800 [Fix] Fix bugs...

[WIP][Feature] DPO

更新了 dpo 的实现，使用 sft 的数据，可以跑通流程，但是存在两个问题： `NPROC_PER_NODE=8 xtuner train internlm2_chat_1_8b_qlora_dpo_ultra_e3 --deepspeed deepspeed_zero2` 1. loss 为 nan 2. deepcopy 的方式不支持量化加载，只有 lora 和不量化加载，流程可以跑通 @xiaohangguo @pppppM 佬们，看下这两个问题是为啥呀

[WIP][Feature] DPO

> ref_model 要不直接用 llm 的 config 重新 build ? > > loss 为 nan 可能要 @xiaohangguo 帮忙看下公式细节可以我试试改成用 llm 的 config 重新 build

[WIP][Feature] DPO

`NPROC_PER_NODE=8 xtuner train internlm2_chat_1_8b_full_dpo_ultra_e3 --deepspeed deepspeed_zero2` 目前 full dpo loss 正常了：接下来按照 trl 文档里的说明添加 qlora dpo: https://moon-ci-docs.huggingface.co/docs/trl/pr_1193/en/dpo_trainer#downsides-to-merging-qlora-before-dpo-approach-2

[WIP][Feature] DPO

> @amulil 请问现在有DPO训练的模型指标对比吗？我想参考这个实现[RLHF-V](https://arxiv.org/abs/2312.00849) code: https://github.com/RLHF-V/RLHF-V, https://github.com/thunlp/Muffin @KooSung 目前暂时没有，后面会参考 https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/README.md 提到的 [zephyr-7b-dpo-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora) 模型来看指标对比。

The sequence parallel is open when I don't use it.

@HIT-cwh I use this config, just set batch_size=4. https://github.com/InternLM/xtuner/blob/193f614ffbb2463010808ebb2e689331a9c5e4f6/xtuner/configs/qwen/qwen1_5/qwen1_5_0_5b_chat/qwen1_5_0_5b_chat_qlora_alpaca_e3.py#L40C8-L40C8 Then I use the command `CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3` to train. Thanks for your tip, I didn't install flash-attn. After...

amulil

[bug] The datasets can't load successfully when using two nodes in slurm.

[WIP][Feature] DPO

[WIP][Feature] DPO

[WIP][Feature] DPO

[WIP][Feature] DPO

The sequence parallel is open when I don't use it.

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?

CUDA out of memory