Qsingle
Qsingle
> Does anyone have insight into why transformers >=4.52 throws this error? I'm working with a finetuned 3B Qwen model that is only compatible with transformers 4.52 and higher, so...
We do not use the mask decoder, so we use one segmentation head to fine-tune the model. So we must tell the task head how many classes it needs to...
> @Qsingle Thank you for your reply. In my case, each image has about 10-14 structures of the same class (the exact number is unknown unless I run some object...
This repo does not support multi-modal training. You can use the [repo](https://github.com/Qsingle/verl) to run Gemma3 with higher performance.
> hello, I still encouter the following question, how do you solve it? I use continue replace this raise tips. > > File "/verl-main-Qsingle/verl/utils/fsdp_utils.py", line 123, in get_fsdp_wrap_policy > raise...
> hello, I can run the GRPO with gemma3-1b-it based on gsm8k dataset. during the training, after training some steps, the grad norm will be NaN. this is my start...
> 1. I have modified the attention implementation at line 214, I still encounter the same eror, which grad_norm is Nan. > 2. when I running the gemma3-1b-it, I confirm...
The PR [#2327](https://github.com/volcengine/verl/pull/2327) could be used for the RL training, and [trl](https://huggingface.co/docs/trl/index) is better for the SFT process. I've checked it on Blackwell Series GPU like Pro 6000, so it...
> I'll try, I am researching about open domain, especially for medical domain, i see your work very excellent. Can we discuss more via other platforms? Email me at [[email protected]](mailto:[email protected])...
Good question, 24 GB memory is enough for all of the models. If you use the vit-b, 12GB memory can run the model.