alignment-handbook issues

Cannot apply "run_dpo.py" on a trained Axolotl model

After using Axolotl to SFT my mistral7b model I tried to align it using DPO At some point in the code (in the DPOTrainer initialization) the code freezes and stops...

MatanVetzler

Does QLora DPO Training support reference model?

Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting: ``` if model_args.use_peft is True: ref_model = None...

Harry-mic

Make docs work

1

lewtun

Is QLoRA better than finetuning?

The results reported in https://github.com/huggingface/alignment-handbook/pull/88 suggest that QLoRA is better for both SFT and DPO. Is this accurate, and have people seen this happen in any other settings?

normster

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

2

When running the DPO script, when calling ```python ##################### # Apply chat template ##################### raw_datasets = raw_datasets.map( apply_chat_template, fn_kwargs={"tokenizer": tokenizer, "task": "dpo"}, num_proc=data_args.preprocessing_num_workers, remove_columns=column_names, desc="Formatting comparisons with prompt template", )...

Feynman27

Memory Issue with 7b Model Fine-Tuning on 6 H100 GPUs

4

Hello everyone, I'm encountering a memory issue while fine-tuning a 7b model (such as Mistral) using a repository I found. Despite having 6 H100 GPUs at my disposal, I run...

apt-team-018

Get this error on run_sft.py when calling "trainer.push_to_hub": [Rank 0] Watchdog caught collective operation timeout

7

Here's the call I'm using to run the script: ``` ACCELERATE_LOG_LEVEL=info accelerate launch --config_file examples/hf-alignment-handbook/configs/accelerate_configs/deepspeed_zero3.yaml --num_processes=2 examples/hf-alignment-handbook/run_sft.py examples/hf-alignment-handbook/configs/training_configs/zephyr-7b-beta/config_lora_sft.yaml --load_in_4bit=true ``` Here's the full trace of the error: ``` 2023-12-01 00:05:43...

ohmeow

Question about AI Feedback (AIF)

In the AI Feedback (AIF) phase, with GPT-4 serving as the teacher model,I am curious to know if there might be a propensity for GPT-4 to assign higher ratings to...

HaoruSung

About Flash Attn's version

1

I encountered this error (importerror: /flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol) and I tried many ways to solve it but failed. Could you kindly share your version of flash-attn?

chengjl19

Training on LORA using multi-gpu is giving constant loss

5

I am trying to train yi-34B model using LORA setup on multi-gpu. But i am getting constant loss i.e. around 2 throughout my SFT training on 4 epochs. And inferencing...

sids07

alignment-handbook
alignment-handbook copied to clipboard

Metadata

Cannot apply "run_dpo.py" on a trained Axolotl model

Does QLora DPO Training support reference model?

Make docs work

Is QLoRA better than finetuning?

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

Memory Issue with 7b Model Fine-Tuning on 6 H100 GPUs

Get this error on run_sft.py when calling "trainer.push_to_hub": [Rank 0] Watchdog caught collective operation timeout

Question about AI Feedback (AIF)

About Flash Attn's version

Training on LORA using multi-gpu is giving constant loss

← Metadata

Owner

Metadata

alignment-handbook alignment-handbook copied to clipboard

Metadata

← Metadata

Owner

Metadata

alignment-handbook
alignment-handbook copied to clipboard