mces89
mces89
Thanks, hope to see dbrx support soon.
Thanks, for dbrx does it support fine-tuning and how many memory is required for full/lora/qlora?
@hiyouga from this link: https://huggingface.co/databricks/dbrx-instruct/discussions/18#660c2f4ee6569b99a2e03f63 seems it requires much more memory than mixtral 8x22B listed in this repo's readme table?
got the same issue too
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \ --config_file ../../accelerate/fsdp_config.yaml \ ../../../src/train_bash.py \ --stage sft \ --do_train \ --do_eval \ --model_name_or_path model_path \ --dataset sample_dataset \ --dataset_dir data \ --template mistral\ --finetuning_type lora \...
Tried mixtral 8x7B too and got the same error.
@hiyouga got it, is there any way to use multiple gpus in a single node to do 8bit qlora?
@hiyouga can you be more specific: --pure_bf16 can be used with --bf16 together? and w should i use cpu offload too?
Thanks, I use --pure_bf16 and --bf16 together with the ds3_cpu_offload deepspeed config, but still get the same error. I'm using this command: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/full_multi_gpu/multi_node.sh is it because i use torch.distributed.run? how...
I also tried using lora in 1 8xA100(80G) node for dbrx and it works. So it means fsdp+qlora uses more gpu memory than lora which is not expected. Maybe there...