Text-Statistics
Text-Statistics copied to clipboard
Numbers are not handled correctly
From Google Code:
Numbers within text numerically (1, 20, 100 etc) may not be handled correctly.
Currently an unknown - should "20" be counted as two syllables ("twen-ty") or as one syllable? Or should it be excluded from the calculations?
I don't know if you were still wondering about this, or even if you wanted to follow the original calculations as specified by J. Peter Kincaid and his team from 1975, but in the original paper[1] numbers are handled as follows:
For the Flesch-Kincaid Reading Ease score- numbers are counted as one word, and the number of syllables is indeed counted as the word is pronounced, "20" is "twen-ty" for 2 syllables, "1918" is "nineteen eighteen" for 4 syllables.[2]
For the Gunning Fog Index- numbers are considered "easy" words, and get a score of one.[3]
The paper covers a few more details for the calculations, like currency symbols, percent signs, etc.
[1] Kincaid, J.P., Fishburne, R.P., Rogers, R.L., & Chissom, B.S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease formula) for Navy Enlisted Personnel. Research Branch Report 8-75. Chief of Naval Technical Training: Naval Air Station Memphis. [2] Kincaid 1975, p. 50. [3] Kincaid 1975, p. 48.
Thanks, this is great info. A little more than I have time to incorporate at the moment, but interesting for future development.