The_Stallion
The_Stallion
I tried with a different model and it works fine `from peft import LoraConfig, TaskType peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1) from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large") from...
I am using qlora with 4-bit quant but somehow i have the same error. For more detail, this is the config i used : `BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false,...
Hello, `pip install networkx-metis` and https://pypi.org/project/networkx-metis/ doesn't work for me neither. Plus i tried to clone the repo and build -> install and i got compilation errors with Cython. (I...
> Hi @dipanjanS ! Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the `pre_classifier` layer, which shouldn't happen...
Is your model loaded on a single GPU ? (I know that you are using DEEPSPEED stage 3 which loads model params on different nodes but i just wanted to...
I finally used FSDP and it works. > `compute_environment: LOCAL_MACHINE > debug: false > distributed_type: FSDP > downcast_bf16: 'no' > fsdp_config: > fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP > fsdp_backward_prefetch: BACKWARD_PRE > fsdp_cpu_ram_efficient_loading: true...
> [2024-12-23 10:54:26,589] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86,...
I forgot to write a follow-up or the temporary fix I found. I removed the use of Flash-attn, it has been some time but i think i ended up removing...
The goal of my work was to understand how PPO works by switching OpenRLHF to work on a single node. I got it working at some point but i didn't...