duckling
duckling copied to clipboard
[EN/DE/ES/...] 'Part of Day' rule coerces timespans
Some time spans are incorrectly interpreted, yielding values that seem caused by a defaulting rule. Parsing is correct, but when building the final Entity
s for some reason the time span from
and to
are incorrect.
All of these report the correct time spans:
vormittags von 3 bis 11 Uhr
vormittags von 4 bis 11 Uhr
...
vormittags von 10 bis 11 Uhr
If the lower bound is 1
or 2
the lower bound is (incorrectly) set to 03:00
.
On the other side, if the upper bound is 12
or greater then the time span defaults to 03:00 to 12:00
consistently. I tracked down the issue to the rule time-of-day (latent)
:
https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/DE/Rules.hs#L959
As far as I understand, 'latent' means that there's some ambiguity. For example, the number 7
could mean several things depending on the context.
Lets change this rule and use the check n < 13
like in the EN
version:
https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/EN/Rules.hs#L577
This will no longer break the lower bound when the upper bound is 12
. Nevertheless, the issue remains for the upper bound: values 12 ...
as upper bound result in ranges 03:00 to 12:00
consistently. Do these values ring any bell? some kind of defaulting rule?
The morning
rule may be the cause of said values (03:00
and 12:00
):
https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/DE/Rules.hs#L1339-L1350
Shouldn't be 13
the upper bound, since it's non-inclusive?
It seems that the morning
rule has precedence over the time-of-day (latent)
rule, the latter correctly detecting the hours in the input.
How can we alter the precedence of said rules? Explicit hours should not be coerced into a range given some modifier like morning
.
Update: This issue is not exclusive to German, it also affects languages like English. For example, the input "morning from 9 to 12" returns the time span 00:00 to 01:00
. The root cause seems to be the rule partOfDays
which is similar to the German morning
rule:
https://github.com/facebook/duckling/blob/72f45e8e2c7385f41f2f8b1f063e7b5daa6dca94/Duckling/Time/EN/Rules.hs#L1001-L1019
The lower bound seems to be coerced into the start of 'morning' (00:00
). I can't seem to find what sets the upper bound to 01:00
though.
Any other input where the upper bound is greater than 12
results in the time span 00:00 to 12:00
, which are the bounds for "morning"
Update: The same issue happens with Spanish. For example, the input "a la mañana de 9 a 13" returns the time span 04:00 to 12:00
which, again, are the bounds for "morning":
https://github.com/facebook/duckling/blob/72f45e8e2c7385f41f2f8b1f063e7b5daa6dca94/Duckling/Time/ES/Rules.hs#L939-L950
Inputs with 12
result in 12:00
as the upper bound (it should be 13:00
). Values higher than 12
as upper bound result in time spans 04:00 to 12:00
consistently.
The same solution should apply to those 3 languages (I'm sure there are more affected by this bug): the "morning" / "part of day" rule should not have priority over literal values.
Prettified debug output:
> debug (makeLocale EN Nothing) "morning between 9 and 12" [Seal Time]
intersect (morning between 9 and 12)
-- part of days (morning)
-- -- regex (morning)
-- between <time-of-day> and <time-of-day> (interval) (between 9 and 12)
-- -- regex (between)
-- -- time-of-day (latent) (9)
-- -- -- integer (numeric) (9)
-- -- -- -- regex (9)
-- -- regex (and)
-- -- time-of-day (latent) (12)
-- -- -- integer (numeric) (12)
-- -- -- -- regex (12)
[
{
"body": "morning between 9 and 12",
"dim": "time",
"end": 24,
"latent": false,
"start": 0,
"value": {
"from": {
"grain": "hour",
"value": "2013-02-12T00:00:00.000-02:00"
},
"to": {
"grain": "hour",
"value": "2013-02-12T01:00:00.000-02:00"
},
"type": "interval",
"values": [
{
"from": {
"grain": "hour",
"value": "2013-02-12T00:00:00.000-02:00"
},
"to": {
"grain": "hour",
"value": "2013-02-12T01:00:00.000-02:00"
},
"type": "interval"
},
{
"from": {
"grain": "hour",
"value": "2013-02-12T09:00:00.000-02:00"
},
"to": {
"grain": "hour",
"value": "2013-02-12T12:00:00.000-02:00"
},
"type": "interval"
},
{
"from": {
"grain": "hour",
"value": "2013-02-13T00:00:00.000-02:00"
},
"to": {
"grain": "hour",
"value": "2013-02-13T01:00:00.000-02:00"
},
"type": "interval"
}
]
}
}
]
Interestingly, the correct answer appears in the possible answers in the response.
@chessai I don't think that's the correct answer considering that the upper bound should be exclusive: we should get 09:00 - 13:00
. I think that the 12:00
that we get as upper bound is just the upper bound of morning
, that is 00:00 - 12:00
.