lexpredict-lexnlp
lexpredict-lexnlp copied to clipboard
Not able to Extract "multiple" dates using get_date
>>>import lexnlp.extract.en.dates
>>> text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
>>> print(list(lexnlp.extract.en.dates.get_dates(text)))
[datetime.date(2020, 3, 15)]
currently the get_dates
, get_raw_date_list
method giving me only the last occurrence of date entity. In above text, i expected 15th july 2018 along with 15th march 2020.
Is there a way to grab all dates from a text/sentence?
Edit: Probably the issue is: the first date in my text was not recognized hence not extracted. Here is the example:
>>> text = "AUTO XX IF SSR TKNA/E OR FA NOT RCVD BY RJ BY 29MAY19 1350 DOH LT,REF IATA PRVD PAX"
>>> list(lexnlp.extract.en.dates.get_raw_dates(text))
[]
Hello @suyashdb,
Thank you for filing this issue.
-
Could you please tell us which version of LexNLP you are using? You can, for example, run
lexnlp.__version__
in a REPL to quickly discern the version. -
I cannot replicate your first example:
In[2]: from lexnlp.extract.en.dates import get_dates_list
In[3]: text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
In[4]: get_dates_list(text)
Out[4]: [datetime.date(2018, 7, 15), datetime.date(2020, 3, 15)]
- It seems like one of the checks LexNLP performs to rule out false positives is preventing date extraction from occuring in your second example. Is
1350
a timestamp (13:50), part of an address, or intended to be some other integer? Could you tell me what domain (agreement, financial document, etc.) your second example is from, and how often such a construction (DD<month, spelt out>YY <integer>
) occurs? We are concious about the possibility of introducing regressions when making changes to the LexNLP extraction functions to handle such cases, and would like to know how frequently such constructions occur.