evaluate icon indicating copy to clipboard operation
evaluate copied to clipboard

measurements/word_length naming

Open meg-huggingface opened this issue 2 years ago • 0 comments

measurements/word_length appears to return the average number of words in the input string(s), rather than the (average) word length. Word length measurements could be a helpful measurement to add, but this measurement doesn't so is a bit misnamed from what I expected.

Some useful, distinct measurements in this family would include:

  • sentence_length: The average number of characters in the input and/or the average number of words in a sentence
  • word_length: The average length of words in the input
  • word_count: The number of words in the input

I propose:

  • Renaming this one (perhaps to sentence_length or string_length), and considering also returning the number of characters (without tokenizing) as an additional option.
  • Or else incorporating into the already-existing word_count, which would mean simply averaging what's already calculated there.

meg-huggingface avatar Aug 21 '22 22:08 meg-huggingface