Xiaohui Chen comments

Results 13 comments of


                                            Xiaohui Chen

run_bgan does not actually switch to use specified optimizer

I believe so

commute time run() and runParrellelApproximation()

Hi, I am actually going to run the algorithm in a pretty large graphs (15000 nodes with millions of edges). So there will have no problem if I ran it...

loss curve of SFT on vicuna-7b

> > Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! > > Yes, it's quite normal. What about the benchmark...

loss curve of SFT on vicuna-7b

I am using "v1" prompt for training and inference. Did you use the same?

Reproducing video mme with qwen3-vl-instruct

Thanks! is it the same for Qwen3-VL-8B/4B?

Reproducing video mme with qwen3-vl-instruct

Thanks for the great work! A followup question on Qwen3-VL-thinking series - May I know what's the decoding strategy used(max_new_tokens, temperature, etc). Thanks!

Reproducing video mme with qwen3-vl-instruct

Sounds good, I am not using vllm. So for model in transformers, hope this following parameters are fine: `python generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=True, top_k=20, top_p=0.95, repetition_penalty=1.0, temperature=0.6)`