Results 91 comments of yxchng

@hello-bluedog 这个只是标注,没有视频,想问的是对应视频在哪里下

@PhoenixZ810 can lmdeploy use multiple gpus? right now, it is extremely slow to eval, especially when evaluating r1-like models with tens of thousands of output tokens

is AmazonCounterfactualClassification 83.16 or 86.15? why the leaderboard is 86.15. My reproduction gives 83.16 similar to the number reproduced in this issue above.

@KennethEnevoldsen yes this is how i run. The result above also show 83.16. Is the result in the screenshot wrong?

@jackyoung96 are you saying that you are able to get 70.7 and 77.0 for llama-3.1-8b-instruct?

@ganler are you aware of any methods that i can do to bring the results closer?