The_Stallion comments

Results 9 comments of


                                            The_Stallion

ValueError: Target modules [q_proj,v_proj] not found in the base model

I tried with a different model and it works fine `from peft import LoraConfig, TaskType peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large") from...

fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

I am using qlora with 4-bit quant but somehow i have the same error. For more detail, this is the config i used : `BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false,...

Fail to pip install networkx-metis

Hello, `pip install networkx-metis` and https://pypi.org/project/networkx-metis/ doesn't work for me neither. Plus i tried to clone the repo and build -> install and i got compilation errors with Cython. (I...

RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40

> Hi @dipanjanS ! Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the `pre_classifier` layer, which shouldn't happen...

NCCL error during saving checkpoint with ds zero3

Is your model loaded on a single GPU ? (I know that you are using DEEPSPEED stage 3 which loads model params on different nodes but i just wanted to...

NCCL error during saving checkpoint with ds zero3

I finally used FSDP and it works. > `compute_environment: LOCAL_MACHINE > debug: false > distributed_type: FSDP > downcast_bf16: 'no' > fsdp_config: > fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP > fsdp_backward_prefetch: BACKWARD_PRE > fsdp_cpu_ram_efficient_loading: true...

flash-attn无法正常工作

> [2024-12-23 10:54:26,589] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86,...

flash-attn无法正常工作

I forgot to write a follow-up or the temporary fix I found. I removed the use of Flash-attn, it has been some time but i think i ended up removing...

flash-attn无法正常工作

The goal of my work was to understand how PPO works by switching OpenRLHF to work on a single node. I got it working at some point but i didn't...