Baber Abbasi comments

Results 51 comments of


                                            Baber Abbasi

Standardize metrics

I don't quite get the distinction between `set_wise_compute` and `aggregation`. You said it's equivalent to aggregation but then you have the condition in your class?

hf (pretrained=meta-llama/Llama-2-7b-hf), limit: None, num_fewshot: None, batch_size: 2 | Task |Version|Filter| Metric |Value | |Stderr| Paper| |-----------------|-------|------|--------|-----:|---|-----:|----:| |agieval_aquarat |Yaml |none |acc |0.2362|± |0.0267| 23.2| | | |none |acc_norm|0.2244|± |0.0262| |...

[Refactor] agieval

@lintangsutawika Ran `truncation` on `llama-7b` and results are pretty much the same. :(

[Refactor] agieval

@lintangsutawika not yet! Will try to verify and finalize soon.

Only a single `filtered_resps` is logged for repeat > 1 for each sample

This was without setting an explicit filter in the yaml while testing `predict_only`. Also doesn't just picking the first generation defeat the whole point of repeat? Why is it even...

Only a single `filtered_resps` is logged for repeat > 1 for each sample

Aah that makes more sense. Wouldn't pass@k be more appropriate for the default? IMO that's a bit more intuitive ( + warning if just greedy sampling).

add bypass metric

@haileyschoelkopf ready for review! I called the metric `bypass`. I don't know if you want to keep that or change it to `predict_only`. Also had to get a bit hacky...

add bypass metric

ok have tested it out and looks to be in working order. No major nits except `logliklihood` table gets printed like this: ``` hf (pretrained=EleutherAI/pythia-70m,device=mps), gen_kwargs: (None), limit: 20.0, num_fewshot:...

add bypass metric

It works for both tasks but can't think of when people would want to use it for logliklihood. Maybe for debugging or experimenting? I can change it so that only...

add bypass metric

Also noticed we aren't passing `do_sample` when people change the gen_kwargs. Should we force that or is it expected people know they need to pass that on as well. There's...