duolingo-solution-viewer
duolingo-solution-viewer copied to clipboard
Feature: Improve the calculation of similarity scores between answers and correct solutions
Currently, the similarity between answers and correct solutions is computed as-is, with only Unicode normalization being applied. Therefore, accented letters and their unaccented counterparts are considered completely different characters.
While this is desirable when the user enters a "perfect" answer with regards to accents, it turns out that the results can get quite random in the contrary case.
A solution would be to compute two similarity scores, applying more or less normalization, then averaging them in a consistent way.
Rather leave the choice to the user of what is significant and what is not, as an option (see #25). This could include:
- case,
- accents,
- ~punctuation,~
- ~spaces,~
- word order (using an adapted version of the
diff package
, or probably rather theSentenceSimilarity package
- benchmark this on big lists of solutions to check whether this is a no-go).
In my experience the order is completely off. There have been absurd sentences at the top (without any noticeable similarity) when the alphabetical sort gave me much more similar answers.
@tobiornottobi Could you please send one or two screenshots with examples of such behavior?
I'm only aware of this happening with missing or different diacritics, but I'll increase the priority of this issue if this happens to be more widespread.
Thanks!
@blmage Yes, I can. One thing I have to add: I wasn't sure if .* sort↓ button toggles the other option or says which option is currently active. The results weren't sorted alphabetically, so maybe it's actually the alphabetical sort that is broken for me.
I haven't gotten absurd suggestions this time – because the accepted answers are all reasonable and similar, but I still don't understand the order.
This is neither sorted by similarity nor alphabetically. Unless only the first word is taken into account.
This makes sense similarity-wise:
I'll try to remember making a screenshot in the future.
@tobiornottobi Thanks for the screenshots!
The UI reflects the current state, so when "Alphabetical sort ↓" is displayed, solutions are/should be sorted alphabetically and in descending order.
The order on the first screenshot seems correct, apart from the two solutions at the top, but I couldn't reproduce the same result in isolation (when testing the comparison algorithm, "ä" comes before "b", as expected).
Could you point me to a skill in the Norwegian tree that uses a lot of accented words? (I'll try to reproduce it from there instead)
@blmage Thank you. :) The screenshot was from the Swedish tree. I can't search at the moment unfortunately.
My bad! In the case of Swedish then, this seems to be the expected behavior:
In addition to the basic twenty-six letters, A–Z, the Swedish alphabet includes Å, Ä, and Ö at the end. They are distinct letters in Swedish, and are sorted after Z as shown above.