duckling_old icon indicating copy to clipboard operation
duckling_old copied to clipboard

"Lago", italian for lake, is considered two words, "L" and "ago" (abbr. of august) and interpreted as a date

Open ibobo opened this issue 9 years ago • 3 comments

I found that duckling understands text in "CamelCase" and "UPPERCASElowercase" fashion, and this is good, but poses a problem when a valid word almost exactly contains an abbreviations. This is the case for some italian words, if entered in "title case", but I bet this can happen for other languages.

Some examples:

  • "Lago" -> "ago" is short for august; this breaks many locations, like "Lago di Como", "Lago di Garda" and the like...
  • "CaprI" -> "apr" is short for april; it's a "strange" casing but can happen

My proposal is to avoid breaking words at the "case barrier" if the whole text contains spaces or if the only uppercase words are a single character at the beginning of words (this is useful for texts formed by a single word).

This would break "SOMETHINGlike this" and "Atext" like this but would solve some more nasty problems.

ibobo avatar Jan 16 '17 12:01 ibobo

A similar problem is for "Vorrei fare UNA prenotazione per domani"(I'm not translating in english as it needs an italian to do this job or someone who speaks italian)

UNA -> it recognize this like it is 1 o'clock (as a datetime)

Take a look at this commit maybe you can fix it: https://github.com/wit-ai/duckling/pull/203/commits/bb8444c1f3b8df8ac227cf54666a0fa7e8506b8b

tedicela avatar Jan 17 '17 14:01 tedicela

Yes, that commit should fix that, a latent time should not show up "alone" as a winning result (from what I could understand). That is a problem we're facing also, I hope my pull request #203 gets merged soon.

ibobo avatar Jan 17 '17 15:01 ibobo

Btw, that problem is not related to the one in this ticket

ibobo avatar Jan 17 '17 15:01 ibobo