ChenDRAG comments

Results 18 comments of


                                            ChenDRAG

trafficstars

It seems that full finetuning has this problem, while lora doesn't. Could you share the yaml training configuration? Also how many GPUs are you using? ![image](https://github.com/huggingface/alignment-handbook/assets/40993476/184babce-d75c-420c-808d-6ced6cbb765b)

DPO loss

Sorry, I did not encounter this problem. Do you use the official binary dataset? What is your base model? Though I don't think they matter that much.

DPO loss

8 A40 cards. My new experiments also encounter this problem. ![image](https://github.com/huggingface/alignment-handbook/assets/40993476/46b95d22-4919-49a8-80d0-8d6befb6ad77) Difference between the two configurations previous bath size 4 accumulation 2 cards 8 lr 1e-7 new batch size 8...

How to perform full parameter finetuning without A100 GPUs

@alvarobartt Thanks a lot for your kind help! However, in the `scripts`, instructions to reproduce experiments are ``` # Full training with ZeRO-3 on 8 GPUs ACCELERATE_LOG_LEVEL=info accelerate launch --config_file...

How to perform full parameter finetuning without A100 GPUs

p.s. I tried `CUDA_VISIBLE_DEVICES=2,3,4,5,6,7,8,9 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --main_process_port 6000 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml` and it still reports OOM error on 8*46Gb cards.

How to perform full parameter finetuning without A100 GPUs

> Deepspeed zero3 will shard the model over several GPUs, this should resolve the OOM issues you see. Note we testing on A100 GB GPUS so you may need to...

Can not align FID with provided checkpoint

@LiCHH @ma-xu @Kumbong . It seems the generation seed is replicate for each class. `recon_B3HW = var.autoregressive_infer_cfg(B=B, label_B=label_B, cfg=cfg, top_k=900, top_p=0.96, g_seed=seed, more_smooth=more_smooth) ` gseed should be different . I...