TimeLineCurator icon indicating copy to clipboard operation
TimeLineCurator copied to clipboard

TERNIP not catching "early", "middle", "late" decade modifiers

Open joshuarrrr opened this issue 10 years ago • 4 comments
trafficstars

For example, should tag "late 1990s" or "early 1980s".

Right now, only the decade itself is recognized, so early 1990s is always represented as 1990-2000.

joshuarrrr avatar Jun 06 '15 00:06 joshuarrrr

@joshuarrrr since that's an approximate description of a time range, how would you interpret that ? ("early 1980s" ~ 1980-1985 and "late 1990s" ~ 1995-2000 ?)

wsdookadr avatar Jun 09 '15 22:06 wsdookadr

@wsdookadr - I'm not so concerned with the precise interpretation, I'd just like to see the modifier word included in the time tag so we can interpret it. Note that in similar cases, the modifier is included in the time tag:

  • "late July" -"mid-1980s"

But note how these other expressions are tagged (indicated with quotes):

  • late "1980s"
  • late "80s"
  • late-1980s (not tagged at all)
  • mid-80s (not tagged at all)
  • mid "1980s"
  • mid "80s"

For TimeLineCurator, I don't think the exact interpretation is particularly critical- as a user, I'd be OK with "late 1990s" interpreted as either 1995-2000 or 1997-2000. Note that even "summer of 1978" currently gets interpreted as "July 1978". Even if that's not ideal, it's still better than just "1978"

joshuarrrr avatar Jun 09 '15 23:06 joshuarrrr

Just read that ternip implements the TimeML specification and the TIMEX3 format for time references. I searched through the test datasets, found some relevant temporal annotations in the TempEval-3 silver dataset.

grep -i "\(mid\|early\|late\).[0-9]\+s<\/" `find -name "*.tml"`

Here are a few results:

XIN_ENG_20051110.0246.tml

<TIMEX3 type="DATE" value="197" tid="t16">the early 1970s</TIMEX3>

AFP_ENG_20051206.0199.tml

<TIMEX3 type="DATE" value="198" tid="t3">the early 1980s</TIMEX3>

AFP_ENG_20061220.0533.tml

<TIMEX3 type="DURATION" value="199" tid="t3">the late 1990s</TIMEX3>

XIN_ENG_20061113.0337.tml

<TIMEX3 type="DURATION" value="199" tid="t4">the early 1990s</TIMEX3>

APW19980219.0476.tml

<TIMEX3 tid="t142" type="DATE" value="197" mod="END"
temporalFunction="false" functionInDocument="NONE">
the late 1970s</TIMEX3>

wsj_0173.tml

<TIMEX3 tid="t26" type="DATE" value="198" mod="START"
temporalFunction="false" functionInDocument="NONE">
the early 1980s</TIMEX3>

wsj_1014.tml

<TIMEX3 tid="t2054" type="DATE" value="199" mod="MID"
temporalFunction="false" functionInDocument="NONE">
the mid-1990s</TIMEX3>

If you look at the "type" attribute, there are two possible values(from the examples above): DATE and DURATION. So maybe ternip would also have to understand which is a duration and which is just an approximate date. I'm thinking "(during|throughout) the (early|late) 1980s" could be seen as a DURATION, and "(in|by) the late 1980s" would be a DATE. The tagged datasets seem to support that. I've also noticed the "value" being a truncated year, with the last digit missing. This is supported by the xsd schema since the <xs:simpleType name="Date"> element has a regex that supports this. The <xs:simpleType name="Duration"> rule in the schema also supports that data.

If it's a TIMEX3 with type "DURING" then there should also be a TLINK with relType="DURING" that ties an EVENT to the TIMEX3.

About annotating the seasons, the TimeML 1.2.1 annotation guidelines (and this example) are saying seasons would be annotated like so:

<TIMEX3 tid="t6" type="DATE" value="1964-SU">summer of 1964</TIMEX3> 

I suppose the value attribute in this case follows the format: YYYY-(WI|SU|FA|SP) which is supported by the xsd schema rule <xs:simpleType name="Season">

In terms of the implementation, I'm thinking some new patterns need to be added to expressions.py and some new logic would need to be added to date_functions.py.

UPDATE (2015-06-14): From the examples found in the training datasets, it seems that only "mid-1990s" had an attribute mod="MID" that would reflect the same structure in the TimeML representation. It seems that the same attribute "mod" can also take the values "START" or "END", which would account for "early" or "late".

wsdookadr avatar Jun 11 '15 05:06 wsdookadr

@wsdookadr wow, nice digging. It looks like we now have a good idea how to address the problem.

joshuarrrr avatar Aug 26 '15 19:08 joshuarrrr