alignment-handbook issues

cannot replicate DPO results of zephyr

5

I cannot replicate the DPO results for zephyr. I use a modified version of config_full.yaml with the only difference being that I set gradient_accumulation_steps: 4 instead of 2 because I...

AlexiaJM

help to do SFT usning multi-machine, for example 8 nodes (1 A100 for 1 node)

1

I modified deepspeed_sero3.yaml, set num_machines to 8 and num_processes to 8, and I got the following error, what else should I do to run SFT on 8 nodes platform. Thanks...

Atlantic8

Dependency updates for QLoRA+FSDP

For QLoRA+FSDP support, the dependencies should be updated: - `bitsandbytes>=0.43.0` - `accelerate>=0.28.0` - `transformers>4.38.2` - `trl>0.7.11` - `peft>0.9.0` Also, it would be wonderful to have accelerate recipe for this too.

deep-diver

Can any one share the script what params should be passed to run_dpo.py

1

Hello, I'm interested in utilizing [run_dpo](https://github.com/huggingface/alignment-handbook/blob/main/scripts/run_dpo.py), but I'm unsure about the required parameters. Could someone provide me with some guidance on which parameters need to be passed?

Oscarjia

Output from zephyr-7b-dpo-qlora is weird

It is said that zephyr-7b-dpo-qlora is finetuned from zephyr-7b-sft-qlora. However, in the adapter config file, the base model is set to mistralai/Mistral-7B-v0.1. Also, I downloaded the model from https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora, and...

ChenDRAG

CPT training is giving pretty unstalbe results with the learning rate 2e-5.

1

I am trying to conduct CPT with a mistral-instruct-v2. But every time, I notice an overshooting in the grad norm. I tried different datasets and managed to re-produce the same...

shamanez

Multi-GPU Training with DPO Full Parameter Stucks

Environment: transformers: 4.39.0.dev0 trl: 0.7.10 torch: 2.2.2 8 x H100 (80GB) I am encountering an issue where the training process with DPO on a multi-GPU setup gets stuck. This problem...

Taishi-N324

"RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1" (DPO + LoRA)

7

So I'm attempting to run the DPO LoRA script and I'm getting this error: ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (4096)...

ohmeow

bug

Constitutional AI models do not achieve MT-Bench scores as reported

Hi, thanks for your great work! I'm especially interested in the recently-introduced constitutional-ai tuning in this [blog post](https://huggingface.co/blog/constitutional_ai). I've found the open-source [SFT model](https://huggingface.co/alignment-handbook/mistral-7b-sft-constitutional-ai) and [DPO model](https://huggingface.co/HuggingFaceH4/mistral-7b-anthropic) on huggingface. However,...

JingtongSu

Can we please add the option to work with a tokenized dataset, escpailly for the CPT task.

1

Since we have the CPT task now, it would be nice to have the ability to feel a tokenized and packed dataset directly.

shamanez

alignment-handbook
alignment-handbook copied to clipboard

Metadata

cannot replicate DPO results of zephyr

help to do SFT usning multi-machine, for example 8 nodes (1 A100 for 1 node)

Dependency updates for QLoRA+FSDP

Can any one share the script what params should be passed to run_dpo.py

Output from zephyr-7b-dpo-qlora is weird

CPT training is giving pretty unstalbe results with the learning rate 2e-5.

Multi-GPU Training with DPO Full Parameter Stucks

"RuntimeError: The size of tensor a (0) must match the size of tensor b (4096) at non-singleton dimension 1" (DPO + LoRA)

Constitutional AI models do not achieve MT-Bench scores as reported

Can we please add the option to work with a tokenized dataset, escpailly for the CPT task.

← Metadata

Owner

Metadata

alignment-handbook alignment-handbook copied to clipboard

Metadata

← Metadata

Owner

Metadata

alignment-handbook
alignment-handbook copied to clipboard