Saeed Khaki

Results 9 comments of Saeed Khaki

I think you do not need BOS but you need EOS.

nevermind, if you set use_cache=False, it will work. Thanks

@younesbelkada Thanks for this solution. I am using accelerate multi-gpu config and it is working well for Mixtral using DPO. My GPUs are 8 A-100 40G. However, It goes OOM...

@janphilippfranken Did deepspeed work for you? It does not work for me.

I see, but it becomes very slow. It does not utilize all capacity of GPUs, e.g. the GPU utilization is low.

@younesbelkada Thanks. It still goes OOM, I added attn_implementation="flash_attention_2" and setting use_cache=False. This is my training scripts and how I call it: ``` accelerate launch --config_file ./accelerate_configs/multi_gpu.yaml --num_processes=8 \ rlhf_dpo_4bit.py...

@younesbelkada Update: I tried using deepspeed_zero2 config and adding cpu offload options, it is still going OOM, here is my zero config ``` compute_environment: LOCAL_MACHINE debug: false deepspeed_config: deepspeed_multinode_launcher: standard...

@younesbelkada Just a quick update, I managed to get it working with zero3+offloading, and by adding: ``` from deepspeed.utils import set_z3_leaf_modules from transformers.models.mixtral.modeling_mixtral import MixtralSparseMoeBlock set_z3_leaf_modules(model, [MixtralSparseMoeBlock]) ``` it significantly...

@Emerald01 could you share your zero2 config? do you use cpu offloading? I have the same problem as it goes out of memory after some steps with Mixtral. My env:...