mces89 comments

Results 22 comments of


                                            mces89

Support for grok-1 and dbrx models

Thanks, hope to see dbrx support soon.

Support for grok-1 and dbrx models

Thanks, for dbrx does it support fine-tuning and how many memory is required for full/lora/qlora?

Support for grok-1 and dbrx models

@hiyouga from this link: https://huggingface.co/databricks/dbrx-instruct/discussions/18#660c2f4ee6569b99a2e03f63 seems it requires much more memory than mixtral 8x22B listed in this repo's readme table?

" too many 429 error responses " in login request

got the same issue too

fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \ --config_file ../../accelerate/fsdp_config.yaml \ ../../../src/train_bash.py \ --stage sft \ --do_train \ --do_eval \ --model_name_or_path model_path \ --dataset sample_dataset \ --dataset_dir data \ --template mistral\ --finetuning_type lora \...

fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Tried mixtral 8x7B too and got the same error.

fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

@hiyouga got it, is there any way to use multiple gpus in a single node to do 8bit qlora?

cannot use pure_bf16 with zero3 cpu offload

@hiyouga can you be more specific: --pure_bf16 can be used with --bf16 together? and w should i use cpu offload too?

cannot use pure_bf16 with zero3 cpu offload

Thanks, I use --pure_bf16 and --bf16 together with the ds3_cpu_offload deepspeed config, but still get the same error. I'm using this command: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/full_multi_gpu/multi_node.sh is it because i use torch.distributed.run? how...

DBRX using more gpu memory than mixtral 8x22B for fsdp+qlora

I also tried using lora in 1 8xA100(80G) node for dbrx and it works. So it means fsdp+qlora uses more gpu memory than lora which is not expected. Maybe there...