parsedatetime icon indicating copy to clipboard operation
parsedatetime copied to clipboard

Localization for nlp prefixes (on/at/in)

Open idpaterson opened this issue 7 years ago • 0 comments

The regular expression that is used in nlp to pull prefixes into a date expression only supports English. For example, in the phrase go to a party on January 4, nlp will pull out on January 4 as the date phrase. Despite on having no effect on how the date is parsed, it is important to include since it is part of the verbal date expression. In my use case, the phrase above would be converted to a task with title go to a party with due date January 4.

The current implementation uses a regular expression to match against the concatenation of the text preceding the date phrase and a number signifying whether the phrase represents date, time, or units. In the example above, RE_NLP_PREFIX would be matched against the string go to a party on 1. It would be awkward to localize as currently implemented.

Since this logic is only evaluated once a date expression is found and is not used to find dates I think we have a lot of flexibility in the implementation. Let's assume that some locales will have both prefixes and suffixes. For a simple implementation, each locale can have a dict of prefixes and a dict of suffixes with keys based on the type of date expression (date, time, unit). This will be more self-documenting than a single regex that knows how to handle dates, times, and units.

idpaterson avatar Sep 08 '16 22:09 idpaterson