meg
meg
https://arxiv.org/abs/1608.07187v3 https://wefe.readthedocs.io/en/latest/_modules/wefe/metrics/WEAT.html
`measurements/word_length` appears to return the average number of words in the input string(s), rather than the (average) word length. Word length measurements could be a helpful measurement to add, but...
Food for thought: Adding a section that deals directly with Redteaming (as that is becoming a big interest/concern). Called it "Safety Testing", so it's not limited to redteaming and can...
[EDIT: May 12] Forgot I had done this.
Hello, I've tried running realtoxicityprompts (github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/realtoxicityprompts/) through the Hugging Face leaderboard backend code (https://huggingface.co/spaces/demo-leaderboard-backend/backend) on A10s and A100s. Each instance of the dataset takes at least 10 seconds to process....
Setting up the Hugging Face Open LLM Leaderboard to use the realtoxicityprompts task through `simple_evaluate`. Keep receiving this warning: `Running generate_until requests: 0%| | 13/99442 [10:42