Baber Abbasi comments

Results 51 comments of


                                            Baber Abbasi

add bypass metric

Updated the warning for `gen_kwargs`. The previous one said "these settings will be used over set parameters in yaml tasks" but we only _updated_ the dict though? Not sure which...

add bypass metric

Added come drafts in the `Readme`! Also moved the table to the bottom, thought this big (and increasingly growing!) thing broke the flow of the document. Let me know what...

add bypass metric

Sounds good! Shouldn't this be merged after #1167? Looks like most of the workarounds here won't be needed after that. Also thinking about it, not really sold on the `predict_only`...

add bypass metric

> I think it's still valuable to have! For example, in the Llemma sympy-checked math tasks, for Maj@K at high K, doing the scoring actually takes way more wall-clock time...

Winogrande Performance Discrepency

The Winogrande results on OpenLLM are 5-shot. Were your evals also 5-shot @JeevanBhoot? That could suggest something changed after `b281b09` in the few-shot split implementation if you're getting the same...

Add task table

@haileyschoelkopf check this when you get a chance. I think we should consider something like Google sheets or just csv. Markdown on its own might be too cluttered, esp with...

Add task table

@haileyschoelkopf Might have missed this!

Generator Error when evaluating GLUE and superGLUE

This should have been fixed in #1229. Are you on the latest commit?

Generator Error when evaluating GLUE and superGLUE

hmm. Can you provide the full command? The previous bug occurred only when using batch "auto".

Generator Error when evaluating GLUE and superGLUE

The second one looks like a tokenizer bug. @haileyschoelkopf