lexpredict-lexnlp icon indicating copy to clipboard operation
lexpredict-lexnlp copied to clipboard

Not able to Extract "multiple" dates using get_date

Open suyashdb opened this issue 4 years ago • 1 comments

>>>import lexnlp.extract.en.dates
>>> text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
>>> print(list(lexnlp.extract.en.dates.get_dates(text)))
[datetime.date(2020, 3, 15)]

currently the get_dates, get_raw_date_list method giving me only the last occurrence of date entity. In above text, i expected 15th july 2018 along with 15th march 2020.

Is there a way to grab all dates from a text/sentence?

Edit: Probably the issue is: the first date in my text was not recognized hence not extracted. Here is the example:

>>> text = "AUTO XX IF SSR TKNA/E OR FA NOT RCVD BY RJ BY 29MAY19 1350 DOH LT,REF IATA PRVD PAX"
>>> list(lexnlp.extract.en.dates.get_raw_dates(text))
[]

suyashdb avatar Jul 11 '20 23:07 suyashdb

Hello @suyashdb,

Thank you for filing this issue.

  1. Could you please tell us which version of LexNLP you are using? You can, for example, run lexnlp.__version__ in a REPL to quickly discern the version.

  2. I cannot replicate your first example:

In[2]: from lexnlp.extract.en.dates import get_dates_list
In[3]: text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
In[4]: get_dates_list(text)

Out[4]: [datetime.date(2018, 7, 15), datetime.date(2020, 3, 15)]
  1. It seems like one of the checks LexNLP performs to rule out false positives is preventing date extraction from occuring in your second example. Is 1350 a timestamp (13:50), part of an address, or intended to be some other integer? Could you tell me what domain (agreement, financial document, etc.) your second example is from, and how often such a construction (DD<month, spelt out>YY <integer>) occurs? We are concious about the possibility of introducing regressions when making changes to the LexNLP extraction functions to handle such cases, and would like to know how frequently such constructions occur.

afparsons avatar Aug 07 '20 18:08 afparsons