simveit

Results 35 comments of simveit

@zhaochenyang20 I think for LIMO we don’t have reference results from there. This is because they used this dataset for training, not evaluation. But maybe someone else did such a...

@zhaochenyang20 this PR includes adjustment of script that includes new way of evaluating suggested in deepssek repo ``` For all our models, the maximum generation length is set to 32,768...

@zhaochenyang20 32.2% instead of 28.9% was for [AIME 2024](https://huggingface.co/datasets/Maxwell-Jia/AIME_2024). The 28.9% are from deepseek r1 repo for qwen 1.5B distill. I evaluate on LIMO later today. this will take more...

Hi @zhaochenyang20 today I ran benchmark on LIMO dataset, this time with 8 tries for each question, the accuracy was marginally higher than in one try (see updated README for...

@zhaochenyang20 now integrated improved parsing and benchmark for AIME 2025. I think this is close to merge

I don't understand. I used [router in one note setting](https://docs.sglang.ai/router/router.html#co-launch-router-and-runtimes). ``` python3 -m sglang_router.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --port 30000 --dp-size 4 ``` Do you mean [this way](https://docs.sglang.ai/router/router.html#launch-runtimes-and-router-separately) of launching runtime and...

@zhaochenyang20 maybe someone can take on from here. The only thing that remains to be done is to run the benchmark multiple times.

Not that you say it maybe its a cleaner way to make this an PR and let me write a seperarte Issue for the benchmarking. This code is working and...

yes we can merge this PR. I will write the issue later.