Yezhen Wang comments

Repositories
Issues
Comments

Results 2 comments of


                                            Yezhen Wang

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

perhaps they used different random seeds and reported the average results or they just picked the highest score to report.

Any plan for releasing the evaluation code?

> Hello, thank you for your interest in LLaDA. We plan to open-source the evaluation metrics for the LLaDA Base model using the lm-evaluation-harness library. This may take some time...