Zongxia Li comments

Results 9 comments of


                                            Zongxia Li

Request for script to reproduce naive GRPO baseline

Refer to the [script](https://github.com/zli12321/Vision-SR1/blob/main/train_examples/1-7b_visionR1_train.sh). The main change is the reward function and the prompt template, which only rewards the final answer and using a CoT prompt.

LoRA support

Currently EasyR1 does not support Lora. The official repo says to ```Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.``` If you still do not have enough memory, [model scope/swift](https://github.com/modelscope/ms-swift) support...

Zongxia Li

Request for script to reproduce naive GRPO baseline

LoRA support

LoRA support

LoRA support

LoRA support

Some test datasets are unavailable

Some test datasets are unavailable

Some test datasets are unavailable

Some test datasets are unavailable