Leonid Sinev comments

Results 100 comments of


                                            Leonid Sinev

Hf chat template

Mentioning issues with some sort of chat templating requests, hoping subscribed ones to take a peek (and maybe test) at this pull request https://github.com/EleutherAI/lm-evaluation-harness/issues/1098 https://github.com/EleutherAI/lm-evaluation-harness/issues/1209 https://github.com/EleutherAI/lm-evaluation-harness/issues/1490

Hf chat template

By the way, you can take a peek at previous attempt of some sort of chat templating PR: https://github.com/EleutherAI/lm-evaluation-harness/pull/1287 Just in case of any pitfalls discussed or mentioned

Hf chat template

https://github.com/EleutherAI/lm-evaluation-harness/issues/1560#issuecomment-1999204933 this detailed comment may be interesting to readers here too

Hf chat template

> there are going to be more custom templates > > it is important to apply the template for that use case So, the template name shouldn't be fixed in...

Accuracy gap between single GPU and multiple GPUs

Thank you for your efforts! Great table with results to compare! > Where did the difference come from? Please check other issues/discussions about speed, batches and multiple GPU usage for...

Accuracy gap between single GPU and multiple GPUs

> Is this also an expected result? No idea. According to your results from table it is also task dependent issue. You may want to further research this case with...

Proper way to add arguments to chosen metrics?

Thanks for your help. I will try to use the solution described while experimenting with moving custom python Task to yaml form using ConfigurableTask. Not sure about time frame of...

No inference time is returned in results

Just connecting other issue (not sure if it is the only one) about inference time report: https://github.com/EleutherAI/lm-evaluation-harness/issues/1236

custom sampler for ConfigurableTask

Made it backward compatible. Added seeds report in results file. Also updated code to be compatible with the main branch for ease of merging. Check this out, please, @haileyschoelkopf

OpenAI models not working with truthfulqa_* tasks

@djstrong What do you think of this suggested workaround with logit_bias? https://github.com/EleutherAI/lm-evaluation-harness/issues/1196#issuecomment-1948246171