Yu-won Lee

Results 230 comments of Yu-won Lee

If you are using image than you could just change image pixels and if you are using video data then you could just change video pixels. Only using 48gb could...

``` --video_min_pixels (int): Option for minimum input tokens for video. --video_max_pixles (int): Option for maximum maxmimum tokens for video. ``` This is the max_pixel that you should set. Setting this...

That's becuase my code was based on LLaVA. Also I havent' made the code for fsdp so you should tweak a bit. It's fine to use SFTTrainer, it won't have...

I haven't tried grounding so I'm not sure exactly. One reason can be the `image_min_pixels` and `image_max_pixels`. If you have set these differenct values during training and testing I thinks...

To fine-tune the vision module with LoRA, use the `finetune_lora_vision.sh` script. If there are layers you want to keep frozen, list them in the `--lora_namespan_exclude` option: ```bash --lora_namespan_exclude "['lm_head', 'embed_tokens',...

I'm gonna try but I'm not sure.

The GRPO training has been added to the repository. Feedbacks and PRs are always welcome :)

I wasn't having any bug when I was trying. I'll take a look an let you know what is wrong.

I've run a simple finetuning test with about 20 samples with mixed-modality and none mixed modality. It didn't occur any bugs gpu spec ``` +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version:...

I was using a docker and if you need the same env I'm using then, I'll make an image for it. Also, can you let me know what problem are...