meg issues

Results 6 issues of

meg

Add WEAT metric for bias testing

https://arxiv.org/abs/1608.07187v3 https://wefe.readthedocs.io/en/latest/_modules/wefe/metrics/WEAT.html

enhancement

bias

measurements/word_length naming

`measurements/word_length` appears to return the average number of words in the input string(s), rather than the (average) word length. Word length measurements could be a helpful measurement to add, but...

Update model-card-annotated.md

Food for thought: Adding a section that deals directly with Redteaming (as that is becoming a big interest/concern). Called it "Safety Testing", so it's not limited to redteaming and can...

Changing how running through all modules is done

[EDIT: May 12] Forgot I had done this.

Question: Realtoxicityprompts takes >10 seconds per query, is this expected behavior?

Hello, I've tried running realtoxicityprompts (github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/realtoxicityprompts/) through the Hugging Face leaderboard backend code (https://huggingface.co/spaces/demo-leaderboard-backend/backend) on A10s and A100s. Each instance of the dataset takes at least 10 seconds to process....

max_new_tokens and max_length conflict

Setting up the Hugging Face Open LLM Leaderboard to use the realtoxicityprompts task through `simple_evaluate`. Keep receiving this warning: `Running generate_until requests: 0%| | 13/99442 [10:42