`measurements/word_length` appears to return the average number of words in the input string(s), rather than the (average) word length. Word length measurements could be a helpful measurement to add, but...
Food for thought: Adding a section that deals directly with Redteaming (as that is becoming a big interest/concern). Called it "Safety Testing", so it's not limited to redteaming and can...
[EDIT: May 12] Forgot I had done this.
Hello, I've tried running realtoxicityprompts ( through the Hugging Face leaderboard backend code ( on A10s and A100s. Each instance of the dataset takes at least 10 seconds to process....
Setting up the Hugging Face Open LLM Leaderboard to use the realtoxicityprompts task through `simple_evaluate`. Keep receiving this warning: `Running generate_until requests: 0%| | 13/99442 [10:42