Zongxia Li
Zongxia Li
Refer to the [script](https://github.com/zli12321/Vision-SR1/blob/main/train_examples/1-7b_visionR1_train.sh). The main change is the reward function and the prompt template, which only rewards the final answer and using a CoT prompt.
Currently EasyR1 does not support Lora. The official repo says to ```Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.``` If you still do not have enough memory, [model scope/swift](https://github.com/modelscope/ms-swift) support...
About 3 hour for SFT. For RL we cut off at some stopping points since it could probably take weeks to finish for 47K data.
Yes. Just use ctrl+c to kill the training program. The stopping point is to check the convergence of the validation dataset. When your expected validation dataset performance is converged, it...
Refer to Some of the [training config files](https://github.com/zli12321/Vision-SR1/blob/main/train_examples/selfReward_config.yaml). Search for ```save_freq``` to modify the saving frequency. Currently it saves every 15 steps.
They should be available now [Pope](https://huggingface.co/datasets/zli12321/pope) [MM-Vet](https://huggingface.co/datasets/zli12321/mm-vet)
It's [here](https://huggingface.co/LMMs-Lab-Turtle/Vision-SR1-3B). 3B is less popular though.
thank you for pointing the issue out. Indeed there are some misplaced paths. Rerunning the MCQ now to make sure they are put under the right path. If it is...
We just confirmed MMMU acc is 55.3. Thanks for pointing out. Will update all the results accordingly.