Xiaohui Chen

Results 13 comments of Xiaohui Chen

Hi, I am actually going to run the algorithm in a pretty large graphs (15000 nodes with millions of edges). So there will have no problem if I ran it...

> > Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! > > Yes, it's quite normal. What about the benchmark...

I am using "v1" prompt for training and inference. Did you use the same?

Thanks! is it the same for Qwen3-VL-8B/4B?

Thanks for the great work! A followup question on Qwen3-VL-thinking series - May I know what's the decoding strategy used(max_new_tokens, temperature, etc). Thanks!

Sounds good, I am not using vllm. So for model in transformers, hope this following parameters are fine: `python generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, top_k=20, top_p=0.95, repetition_penalty=1.0, temperature=0.6)`

I was able to reproduce a similar result (70.5) for instruct model - try using greedy decoding.

Hi @lihengtao, did you obtain result for 8b thinking mode? I only got 67 using transformers backend! @ycsun1972, follow-up on this, I ran the thinking mode via vllm using the...

I am using transformer backend, the do_resize=False is passed and I got these info when printingvideo_grid_thw: ``` tensor([[102, 40, 72]]) tensor([[96, 40, 72]]) tensor([[93, 22, 40]]) tensor([[113, 40, 72]]) tensor([[110,...