Recognizers-Text icon indicating copy to clipboard operation
Recognizers-Text copied to clipboard

[DE|EN DateTimeV2] Leading article swallowed

Open martinmueller4voice opened this issue 2 years ago • 2 comments

Bug: A leading article before the date is part of the matched text

To Reproduce "Der 13.6.2022 war ein Montag" matches the date 13.6.2022 correctly, but includes the article "Der". Thus, match.Text is "der 13.6.2022" and match.Start is 0 when it's supposed to be 4. This happens with "den" as well ("Den 4. April 2000 wird er nie vergessen"), but not with "am" ("Am 24.02.2022 überfiel Russland die Ukraine"). English also shows this behavior: "The first of June 2022 was a Wednesday" matches "the first of june".

Expected behavior The date only has to be included in the matched text, otherwise you get missing words when trying to re-format just the date.

Platform

  • Platform: .NET
  • Environment: nuget package
  • Version of package v1.8.4

martinmueller4voice avatar Jun 30 '22 07:06 martinmueller4voice

Thanks for reporting the issue in German too, @martinmueller4voice! This is a known issue where the behaviour is not perfect. However, as it's consistent with English and the behaviour has been there for a while, changing it can be considered as breaking for users. As it's not technically incorrect, the issue is low priority and the behaviour will likely only be changed togehter with major changes in a future v2 (no current planne date).

tellarin avatar Jun 30 '22 08:06 tellarin

Hi Börje! Thanks for your prompt answer, but I’m a bit at a loss because of this and don’t know how to work around it. I’m creating a kind of post processing for speech recognition results and want to give the user the choice to have dates to be formatted in a consistent manner. For this I need only the plain dates to be matched, parsed and then replace the match with the desired format. When the article is matched as well, the whole sentence can get mutilated. Do you have any idea (by chance) how to get around this?

Thanks in advance Martin

Von: Börje Karlsson @.> Gesendet: Donnerstag, 30. Juni 2022 10:38 An: microsoft/Recognizers-Text @.> Cc: Martin Müller @.>; Mention @.> Betreff: Re: [microsoft/Recognizers-Text] [DE|EN DateTimeV2] Leading article swallowed (Issue #3005)

Thanks for reporting the issue in German too, @martinmueller4voicehttps://github.com/martinmueller4voice! This is a known issue where the behaviour is not perfect. However, as it's consistent with English and the behaviour has been there for a while, changing it can be considered as breaking for users. As it's not technically incorrect, the issue is low priority and the behaviour will likely only be changed togehter with major changes in a future v2 (no current planne date).

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/Recognizers-Text/issues/3005#issuecomment-1170932612, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AROFYVUNUAH2HO22W46OZT3VRVME5ANCNFSM52IHVACA. You are receiving this because you were mentioned.Message ID: @.@.>>

martinmueller4voice avatar Jun 30 '22 08:06 martinmueller4voice