Alternative readbility metrics
Currently docreview uses Flesch-Kincaid to analyse the vignettes. However, there may be other metrics to use as well as or instead of which can provide more useful analyses.
Factors identified by Pitler and Nenkova (2008):
LogL of Discourse Relations (r = .4835) nope! LogL, NEWS (r= .4497) - log likelihood of an article based on a source; more likely an article is the better Average Verb Phrases (.4213) - number of verb phrases (https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html?) LogL, WSJ (r = .3723) - log likelihood of an article based on a source; more likely an article is the better Number of words (r = -.3713) - longer articles considered less readable
https://aclanthology.org/D08-1020.pdf
- general word familiarity (use a commonly used corpus); how probably an article is based on it's vocabulary article likelihood
- technical word familiarity (scrape r4ds and some other good sources)
- document length
- verb phrases