xrenner icon indicating copy to clipboard operation
xrenner copied to clipboard

Mechanism for date-time awareness

Open amir-zeldes opened this issue 8 years ago • 1 comments

Some indexical expressions such as 'this year' or 'today' can be coreferent with an actual lexical NP in the text, especially in news texts where the article is dated. A mechanism should be devised to recognize likely indications of the text's current date-time in order to capture these.

As a proof-of-concept prototype we can try to find entire utterances that contain only a date/time. If a text contains a pattern matching one of the typical date/time patterns, some global variables should be set and modeled in a new object representing the entire document:

  • document.date
  • document.time

If these are not fully known, we can still specify some partial date/time information, which should always be available even if the above are known (as convenience functions):

  • document.weekday
  • document.year

When processing documents, common noun markables can be matched against configurable patterns (case insensitive), which map to certain document properties:

this year -> document.year

As soon as a suspect date-pattern is encountered, it will be added to the LexData object's coref.tab dictionary. The workflow is:

  • Encounter a sentence consisting solely of a date, based on some predefined set of date formats
  • Update document object properties (document.date)
  • Add entries to lex.coref to anticipate this year -> 2016 (once we know it's 2016)
  • Now normal matching should catch this year -> 2016

The list of patterns should be a semi-colon separated entry in the config.ini for the language, e.g.:

year_ref=this year;the current year
day_ref=today

amir-zeldes avatar Mar 07 '16 16:03 amir-zeldes