duolingo-solution-viewer icon indicating copy to clipboard operation
duolingo-solution-viewer copied to clipboard

Feature: Improve the calculation of similarity scores between answers and correct solutions

Open blmage opened this issue 4 years ago • 7 comments

Currently, the similarity between answers and correct solutions is computed as-is, with only Unicode normalization being applied. Therefore, accented letters and their unaccented counterparts are considered completely different characters.

While this is desirable when the user enters a "perfect" answer with regards to accents, it turns out that the results can get quite random in the contrary case.

A solution would be to compute two similarity scores, applying more or less normalization, then averaging them in a consistent way.

blmage avatar May 15 '20 10:05 blmage

Rather leave the choice to the user of what is significant and what is not, as an option (see #25). This could include:

  • case,
  • accents,
  • ~punctuation,~
  • ~spaces,~
  • word order (using an adapted version of the diff package, or probably rather the SentenceSimilarity package - benchmark this on big lists of solutions to check whether this is a no-go).

blmage avatar Jun 21 '20 08:06 blmage

In my experience the order is completely off. There have been absurd sentences at the top (without any noticeable similarity) when the alphabetical sort gave me much more similar answers.

tobiornottobi avatar Sep 01 '20 08:09 tobiornottobi

@tobiornottobi Could you please send one or two screenshots with examples of such behavior?

I'm only aware of this happening with missing or different diacritics, but I'll increase the priority of this issue if this happens to be more widespread.

Thanks!

blmage avatar Sep 05 '20 07:09 blmage

@blmage Yes, I can. One thing I have to add: I wasn't sure if .* sort↓ button toggles the other option or says which option is currently active. The results weren't sorted alphabetically, so maybe it's actually the alphabetical sort that is broken for me. I haven't gotten absurd suggestions this time – because the accepted answers are all reasonable and similar, but I still don't understand the order. This is neither sorted by similarity nor alphabetically. Unless only the first word is taken into account. image This makes sense similarity-wise: image

I'll try to remember making a screenshot in the future.

tobiornottobi avatar Sep 05 '20 13:09 tobiornottobi

@tobiornottobi Thanks for the screenshots!

The UI reflects the current state, so when "Alphabetical sort ↓" is displayed, solutions are/should be sorted alphabetically and in descending order.

The order on the first screenshot seems correct, apart from the two solutions at the top, but I couldn't reproduce the same result in isolation (when testing the comparison algorithm, "ä" comes before "b", as expected).

Could you point me to a skill in the Norwegian tree that uses a lot of accented words? (I'll try to reproduce it from there instead)

blmage avatar Sep 08 '20 11:09 blmage

@blmage Thank you. :) The screenshot was from the Swedish tree. I can't search at the moment unfortunately.

tobiornottobi avatar Oct 23 '20 14:10 tobiornottobi

My bad! In the case of Swedish then, this seems to be the expected behavior:

In addition to the basic twenty-six letters, A–Z, the Swedish alphabet includes Å, Ä, and Ö at the end. They are distinct letters in Swedish, and are sorted after Z as shown above.

Wikipedia

blmage avatar Oct 26 '20 11:10 blmage