Wei-Lin Chiang

Results 111 comments of Wei-Lin Chiang

@ericzhou571 would you mind separating the training scripts to another PR? Let's make sure its inference works smoothly in this PR first.

@Zhilin123 thanks for the PR and generating these results. very helpful. we're working on fixing this too. Since changing judge model affects result quite a lot, we are still reviewing...

@odelalleau thanks! recently we've been working hard on a pipeline to generate our next generation benchmark (the [Arena-Hard](https://github.com/lm-sys/arena-hard) as you mentioned), which we believe offers significantly better separability than MT-bench....

got it, yes it makes sense and sorry for the delay as there's a lot going on right now. we'll for sure merge this fix into v1.1, as judge model...

actually @DSYZayn could you double check all the places we dump conversation data to json files are fixed?

Feel free to open again if there's still issue.

Closing it for now. feel free to open if you still have question.

@vince62s feel free to open again if you still have question.