Xiaohui Chen
Xiaohui Chen
Hi, I am actually going to run the algorithm in a pretty large graphs (15000 nodes with millions of edges). So there will have no problem if I ran it...
> > Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! > > Yes, it's quite normal. What about the benchmark...
I am using "v1" prompt for training and inference. Did you use the same?
Thanks! is it the same for Qwen3-VL-8B/4B?
Thanks for the great work! A followup question on Qwen3-VL-thinking series - May I know what's the decoding strategy used(max_new_tokens, temperature, etc). Thanks!
Sounds good, I am not using vllm. So for model in transformers, hope this following parameters are fine: `python generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, top_k=20, top_p=0.95, repetition_penalty=1.0, temperature=0.6)`
I was able to reproduce a similar result (70.5) for instruct model - try using greedy decoding.
Hi @lihengtao, did you obtain result for 8b thinking mode? I only got 67 using transformers backend! @ycsun1972, follow-up on this, I ran the thinking mode via vllm using the...
I am using transformer backend, the do_resize=False is passed and I got these info when printingvideo_grid_thw: ``` tensor([[102, 40, 72]]) tensor([[96, 40, 72]]) tensor([[93, 22, 40]]) tensor([[113, 40, 72]]) tensor([[110,...