py-readability-metrics icon indicating copy to clipboard operation
py-readability-metrics copied to clipboard

Dale-Chall score

Open jrraines opened this issue 3 years ago ā€¢ 9 comments

I compared the Dale-Chall results on one of Robert Munsch's stories ('Love You Forever') from this with other scoring software (e.g. the one you cite). The results were drastically different. I don't know much about the topic, but looking at wikipedia I see that the 1995 revision of Dale-Chall (that expanded the word list to 3000) completely changed the formula:

"In 1995, Dale and Chall published a new version of their formula with an upgraded word list, the New Daleā€“Chall readability formula.[45] Its formula is:

Raw score = 64 - 0.95 *(PDW) - 0.69 *(ASL) "

vs. what is in the code :

def _score(self): stats = self._stats words_per_sent = stats.num_words / stats.num_sentences percent_difficult_words =
stats.num_dale_chall_complex / stats.num_words * 100 raw_score = 0.1579 * percent_difficult_words + 0.0496 * words_per_sent adjusted_score = raw_score + 3.6365
if percent_difficult_words > .05
else raw_score return adjusted_score

Wikipedia shows that formula from the earlier version of Dale-Chall: "this equation from 1948:

Raw score = 0.1579*(PDW) + 0.0496*(ASL) if the percentage of PDW is less than 5 %, otherwise compute
Raw score = 0.1579*(PDW) + 0.0496*(ASL) + 3.6365"

jrraines avatar Aug 19 '20 17:08 jrraines

https://en.wikipedia.org/wiki/Readability

Which is different from their article on Dale-Chall which just gives the old formula. Another issue I noticed while googling around is whether unique unfamiliar words or number of unfamiliar words figure in the formula--for the text I used this is a big deal. Do we count 27 instances of 'forth' or one?

It doesn't seem like the grade level equivalence can use the old formula but I haven't googled up a new one.

jrraines avatar Aug 19 '20 17:08 jrraines

we count each occurrence of the word. words are captured usings lists not sets, thus its not unique words but all words. all in all, for the case above, it would be 27 instances of forth.

If you find, that sets of words should be used in some instances, the change would be fairly straight forward. all statistics input to each scorer is computed here.

feel free to experiment, i'm happy to accept PRs. also, happy to make the change as well, if you find something definitive

cdimascio avatar Aug 20 '20 00:08 cdimascio

@jrraines fyi, u may want to update to the latest version.

It corrects if percent_difficult_words > .05 to if percent_difficult_words > 5

cdimascio avatar Aug 20 '20 01:08 cdimascio

Thanks for your reply! I will try that later.

I think the deal is that 1. all the online tools are using the 1995 3000 word list with the 1948 formula (that was intended and validated with a 1000 word list) and 2. the algorithm is supposed to be used on a text sample of about 1000 words and the number of unique words (i.e. the set). Again, I donā€™t know a lot about it, yet. I did order the 1995 book to try to understand what is going on better.

Hereā€™s what I wrote out to clarify what I saw for myself:

I used https://www.interventioncentral.org/teacher-resources/oral-reading-fluency-passages-generator https://www.interventioncentral.org/teacher-resources/oral-reading-fluency-passages-generator On the text of Robert Munschā€™s Love You Forever story and got these readability scores: Formula

FORCAST (?) https://www.interventioncentral.org/rti2/oralReadings 7.33 Spache (?) https://www.interventioncentral.org/rti2/oralReadings 3.71 Dale-Chall (?) https://www.interventioncentral.org/rti2/oralReadings 3 Flesch-Kincaid (?) https://www.interventioncentral.org/rti2/oralReadings 5.2 Coleman-Liau (?) https://www.interventioncentral.org/rti2/oralReadings 4.0 Automated Readability Index (?) https://www.interventioncentral.org/rti2/oralReadings 5.0 Flesch Reading Ease (?) https://www.interventioncentral.org/rti2/oralReadings 91.2/100 Fog Index (?) https://www.interventioncentral.org/rti2/oralReadings 8.4 Lix Formula (?) https://www.interventioncentral.org/rti2/oralReadings 23.6 = below school year 5 SMOG-Grading (?) https://www.interventioncentral.org/rti2/oralReadings 6.5

Then I took the same text and used py-readability-metrics library and got these scores:

Flesch-Kincaid: score 6.200658777768648 and grade level 6. Flesch reading ease: score--84.35790204618294; ease--easy and grade level--['6']. Dale-Chall score--6.940605503054209 and grade level ['7', '8']. Automated Reading Index: score--4.968700797107406; grade level--['5'] and age--[10, 11]. Gunning Fog score--8.792845207768375 and grade level 9. Coleman Liau score 3.332767962308594 and grade level 3. Smog score 7.485028197883463 and grade level 7. Spache score 4.6358866518749835 and grade level 5. Linsear Write score 10.453488372093023 and grade level 10. Statistics:{'num_letters': 2979, 'num_words': 849, 'num_sentences': 43, 'num_polysyllabic_words': 25, 'avg_words_per_sentence': 19.74418604651163, 'avg_syllables_per_word': 1.210836277974087}

The open source python library gives higher scores for most of the measures. ARI is almost identical and Dale-Chall is very different. Coleman Liau is lower in the open source python tool.

It seems to me that 3rd graders can virtually all handle the text (which does have some long sentences). Munsch uses repetition at many levels in the story and that helps the students build speed and may not be accounted for by any of the scoring systems. Iā€™m bothered by the lack of agreement between the two implementations.

Here is the text of the story:

A mother held her new baby and very slowly rocked him back and forth, back and forth, back and forth, back and forth. And while she held him, she sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

The baby grew. He grew and he grew and he grew. He grew until he was two years old and he ran all around the house. He pulled all the books off the shelves. He pulled all the food out of the refrigerator and he took his motherā€™s watch and flushed it down the toilet. Sometimes his mother would say, ā€œThis kid is driving me CRAZY!ā€
But at night time, when that two-year-old was quiet, she opened the door to his room, crawled across the floor, looked up over the side of his bed; and if he was really asleep she picked him up and rocked him back and forth, back and forth, back and forth, back and forth. And while she held him, she sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

The little boy grew. He grew and he grew and he grew. He grew until he was nine years old. And he never wanted to come in for dinner, he never wanted to take a bath, and when grandma visited he always said bad words. Sometimes his mother wanted to sell him to the zoo!
But at night time, when the was asleep, she opened the door to his room, crawled across the floor, looked up over the side of his bed; and if he was really asleep she picked up that nine-year-old boy and rocked him back and forth, back and forth, back and forth, back and forth. And while she held him, she sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

The boy grew. He grew and he grew and he grew. He grew until he was a teenager. He had strange friends and he wore strange clothes and he listened to strange music. Sometimes the mother felt like she was in a zoo!
But at night time, when that teenager was quiet, she opened the door to his room, crawled across the floor, looked up over the side of his bed; and if he was really asleep she picked him up and rocked him back and forth, back and forth, back and forth, back and forth. And while she held him, she sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

That teenager grew. He grew and he grew and he grew. He grew until he was a grown-up man. He left home and got a house across town.
But sometimes on dark nights the mother got into her car and drove across town. If  all the lights in her sonā€™s house were out, she opened his bedroom window, crawled across the floor, and looked up over the side of his bed. If that great big man was really asleep she picked him up and rocked him back and forth, back and forth, back and forth, back and forth. And while she held him, she sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

Well, that mother, she got older. She got older and older and older. One day she called up her son and said, ā€œYouā€™d better come see me because Iā€™m very old and sick.ā€ So her son came to see her. When he came in the door she tried to sing the song. She sang:

Iā€™ll love you forever, Iā€™ll like you for alwaysā€¦ But she couldnā€™t finish because she was too old and sick. The son went to his mother. He picked her up and rocked her back and forth, back and forth, back and forth. And he sang this song: Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my Mommy youā€™ll be.

When the son came home that night, he stood for a long time at the top of the stairs.
Then he went into the room where his very new baby daughter was sleeping. He picked her up in his arms and very slowly rocked her back and forth, back and forth, back and forth, back and forth. And while he held her, he sang:

Iā€™ll love you forever, Iā€™ll like you for always, as long as Iā€™m living my baby youā€™ll be.

So then I went to https://readabilityformulas.com/free-readability-formula-tests.php https://readabilityformulas.com/free-readability-formula-tests.php Which is referenced in the python open source toolā€™s documentation. That gives these results which are closer to the results from intervention central:

Flesch Reading Ease score: 91 (text scale) Flesch Reading Ease scored your text: very easy to read. [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

Gunning Fog: 7.5 (text scale) Gunning Fog scored your text: fairly easy to read. [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

Flesch-Kincaid Grade Level: 4.9 Grade level: Fifth Grade. [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

The Coleman-Liau Index: 4 Grade level: Fourth Grade [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

The SMOG Index: 3.5 Grade level: Fourth Grade [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

Automated Readability Index: 4.9 Grade level: 8-9 yrs. old (Fourth and Fifth graders) [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

Linsear Write Formula : 7.9 Grade level: Eighth Grade. [ f ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ a ] https://readabilityformulas.com/freetests/six-readability-formulas.php# | [ r ] https://readabilityformulas.com/freetests/six-readability-formulas.php#

Readability Consensus Based on (7) readability formulas, we have scored your text:

Grade Level: 5 Reading Level: very easy to read. Reader's Age: 8-9 yrs. old (Fourth and Fifth graders)

Dale-Chall has a separate calculator at the site:

of words NOT found on Dale-Chall Word List : 65

Percent of words NOT found on Dale-Chall Word List: : 8%

(Show words NOT on the Dale-Chall Word List) forth(27) | living(7) | pulled(2) | flushed(1) | driving(1) | opened(4) | looked(4) | picked(6) | wanted(3) | teenager(3) | older(4) | called(1) | tried(1) | mommy(1) |

Dale-Chall Formula worksheet Raw score 2.1767 [ ? ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php# Adjusted Score: (3.6365 + 2.1767) [ ? ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php# Final Score: 5.8 [ ? ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php#

New Dale-Chall Readability Index: Grade level: Grades 5 - 6 [ f ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php#| [ a ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php#| [ r ] https://readabilityformulas.com/dalechallformula/dale-chall-formula.php# So the repetition of ā€˜back and forthā€™ was counted heavily against the passage, whereas in actual practice it would build speed!

And furthermore there is poor agreement between the scores at the site referenced by py-readability-metrics and the output of their tool.

On Aug 19, 2020, at 8:01 PM, Carmine DiMascio [email protected] wrote:

@jrraines https://github.com/jrraines fyi, u may want to update to the latest version.

It corrects if percent_difficult_words > .05 to if percent_difficult_words > 5

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdimascio/py-readability-metrics/issues/17#issuecomment-676836000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIIQKDQKVISU7ASJSMJQA3SBRYXDANCNFSM4QFFQPEA.

jrraines avatar Aug 20 '20 15:08 jrraines

I donā€™t see that that was a change from what I had before. Iā€™m a bit confused since __version is 1.4.5 and github seems like itā€™s 1.4.4

the deal is that I am retired and have volunteered with Reading Corps (part of Americorps) for 4 years. Iā€™m really fooling around waiting for training for the coming year to start next week and looking for something to occupy my mindā€”keep it off current events a bit! We have used that story with 3rd graders who are behind where they should be and it certainly should not be scored at grade 7!

I tried modifying analyzer.pyā€”taking the dale_chall complex count out of the for loop, then taking the set(tokens) and then doing a for loop on the set. That gives score 1.5 and grade level [1,2,3,4] which might agree with intervention central, at least and certainly comes closer to my expectation. I did not do anything to limit the text length to 1000 words (which seems advisable; if the reader has recently decoded a word then it sort of temporarily joins the list of common, known words. If the word was decoded farther back than 1000 words ago, then it wouldnā€™t necessarily be recalled).

Anyway the book will arrive in a couple of weeks.

On Aug 19, 2020, at 8:01 PM, Carmine DiMascio [email protected] wrote:

@jrraines https://github.com/jrraines fyi, u may want to update to the latest version.

It corrects if percent_difficult_words > .05 to if percent_difficult_words > 5

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdimascio/py-readability-metrics/issues/17#issuecomment-676836000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIIQKDQKVISU7ASJSMJQA3SBRYXDANCNFSM4QFFQPEA.

jrraines avatar Aug 20 '20 18:08 jrraines

The 1995 book did come yesterday. After looking at it, I would say that it is no wonder that online tools produce such discordant results. The procedure they describe is complex. Parts of it could be implemented by algorithm but even those would not necessarily produce the same results. Indeed, since what is recommended is to pick (and average) several 100 word samples from the text being analyzedā€”not the first or last segments, it seems like the same algorithm might produce different results on successive attempts if it used a random number generator correctly. Many of the rules seem (to my amateur eye) to be impossible to implement by algorithmā€”they involve human judgement (e.g. how to treat hyphenated words depending on their customary use).

You are correct that ā€˜forthā€™ would be counted each time it occurred in a 100 word sample. Proper names that occurred repeatedly would be counted just once. Numbers would be counted as familiar or unfamiliar depending on how big they wereā€”commas or periods in numbers (recall that U.S. and European conventions for those are reversed) would complicate the algorithm but would be the least of the problems.

If a section heading occurs in the 100 word selection, it gets counted as a sentence (despite lacking a period); I suppose an algorithm might use end-of-line as an indicator.

No long texts to validate calculations are really given, some short examples exist toward the end of the book. But they are really too short for the full procedure to be applied to them.

The revised equation for Cloze score is given as described in wikipedia. There is no equation for grade level given; one uses tables for that.

I was left confused about the word list used in 1948. It seems like early in the 1940ā€™s Dale had a 769 word list, but later (maybe in 1948) he had a 3000 word list. The list was revised (in the 80ā€™s) replacing agricultural terms with more modern words.

Iā€™m disappointed; they describe validating its accuracy quite spectacularly. I imagine that it took quite some time to train students in their dept. to be able to reproduce the professorsā€™ judgement, however.

On Aug 19, 2020, at 8:01 PM, Carmine DiMascio [email protected] wrote:

@jrraines https://github.com/jrraines fyi, u may want to update to the latest version.

It corrects if percent_difficult_words > .05 to if percent_difficult_words > 5

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdimascio/py-readability-metrics/issues/17#issuecomment-676836000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIIQKDQKVISU7ASJSMJQA3SBRYXDANCNFSM4QFFQPEA.

jrraines avatar Aug 27 '20 21:08 jrraines

After looking at it, I would say that it is no wonder that online tools produce such discordant results. The procedure they describe is complex. Parts of it could be implemented by algorithm but even those would not necessarily produce the same results.

Exactly. An area to look at that can certainly help to improve scores would be to enhance the syllable count estimator. syllable counts play a role i many of these scores. The current estimator is fairly simple. it would interesting to improve this. it will certainly impact the resulting score. This blog post attempts a more sophisticated approach. it may be interesting to include some of those ideas here

cdimascio avatar Oct 09 '20 14:10 cdimascio

It is a difficult issue. I see Knuth addressed hyphenation which turns out to be a different problem. The basic idea of http://www.onebloke.com/2011/06/counting-syllables-accurately-in-python-on-google-app-engine/ seems almost like cheating. But robust within the limits of the dictionary bundled into ntlk.

On Oct 9, 2020, at 9:44 AM, Carmine DiMascio [email protected] wrote:

After looking at it, I would say that it is no wonder that online tools produce such discordant results. The procedure they describe is complex. Parts of it could be implemented by algorithm but even those would not necessarily produce the same results.

Exactly. An area to look at that can certainly help to improve scores would be to enhance the syllable count estimator. syllable counts play a role i many of these scores. The current estimator is fairly simple https://github.com/cdimascio/py-readability-metrics/blob/master/readability/text/syllables.py. it would interesting to improve this. it will certainly impact the resulting score. This blog post https://eayd.in/?p=232 attempts a more sophisticated approach. it may be interesting to include some of those ideas here

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cdimascio/py-readability-metrics/issues/17#issuecomment-706223166, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIIQKFYZBJ3GCWPM5FANCDSJ4OVRANCNFSM4QFFQPEA.

jrraines avatar Oct 11 '20 02:10 jrraines

thanks @jrraines , this is definitely worth a look

cdimascio avatar Oct 13 '20 00:10 cdimascio