docreview icon indicating copy to clipboard operation
docreview copied to clipboard

Alternative readbility metrics

Open thisisnic opened this issue 4 years ago • 0 comments

Currently docreview uses Flesch-Kincaid to analyse the vignettes. However, there may be other metrics to use as well as or instead of which can provide more useful analyses.

Factors identified by Pitler and Nenkova (2008):

LogL of Discourse Relations (r = .4835) nope! LogL, NEWS (r= .4497) - log likelihood of an article based on a source; more likely an article is the better Average Verb Phrases (.4213) - number of verb phrases (https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html?) LogL, WSJ (r = .3723) - log likelihood of an article based on a source; more likely an article is the better Number of words (r = -.3713) - longer articles considered less readable

https://aclanthology.org/D08-1020.pdf

  1. general word familiarity (use a commonly used corpus); how probably an article is based on it's vocabulary article likelihood
  2. technical word familiarity (scrape r4ds and some other good sources)
  3. document length
  4. verb phrases

thisisnic avatar Jul 20 '21 15:07 thisisnic