Benjamin Bossan
Benjamin Bossan
Thanks for bringing FSDP2 to our (or at least my) attention. The changes described in the document you linked sound very reasonable and could remove some of the common pain...
Thanks a lot for clarifying my confusion. In that case, I think it makes sense to wait until FSDP2 is released and then run experiments with accelerate to see how...
Hi, I checked the notebook for possible problems. Some issues I saw: - It manually calls `peft_model = get_peft_model(model, peft_config)` but then passes the `peft_config` again to `SFTTrainer`, which leads...
Do I understand correctly that your goal is: 1. First train LoRA on the dataset 2. Save the LoRA model 3. Load the LoRA model 4. Train prompt tuning on...
I see, thanks for explaining further. I don't think that prompt-tuning is fundamentally broken with Qwen 2.5 VL. When I changed your script to use the original dataset from the...
For completeness, here is what I used: ```python import os from datasets import load_dataset import torch from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration from peft import LoraConfig, PromptTuningInit, PromptTuningConfig, TaskType, PeftModel from...
Using my script above, I set `max_steps=10` to avoid OOM, then evaluated the model like so: ```python def process_inputs(conversation): # Preparation for inference text = processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs,...
Ah, I had a mistake in my code, I used `model.generate` but it should have been `trainer.model.generate`, as `model` is just the base model without prompt tuning. If I use...
Nice catch, thanks for reporting. Indeed, for training there is no need for `use_cache`. IIUC, it will be [disabled automatically](https://github.com/huggingface/transformers/pull/41585) when training with transformers starting with v5.
If you train a model on GPU, save it, then load it on a machine without GPU, it should already work and be automatically transferred to CPU. Please give this...