dateparser
dateparser copied to clipboard
[WIP] add find_date function and tests: finds the first date
This function is similar to search_dates, but gets only the first date, and more suitable to shorts strings. It uses a brute-force approach, but has more predictable performance (at least on shorter strings - we limit the length to 100), and better quality in our tests, although still not perfect.
Not sure if it should be exposed in current form, but may be useful for future development.
NOTE: this PR is not worked on currently and free to pick up.
TODO:
- [ ] gather feedback on whether the API (finding the first match) makes sense? Maybe instead we could have an API similar to search_dates and make this a generator, allowing the user to stop at the first match. Although that would likely make the implementation a bit harder.
- [ ] Also what's about new vs old search dates? There are likely cases when old search dates works better in terms of performance or quality, and we won't maintain full backwards compatibility if we replace.
- [ ] clean up the API regarding languages (not sure if auto-adding 'en' and auto-detection is what we want) - that would require some test changes
- [ ] clean up the date cleanup code - it might be invalid for some languages?
- [ ] take care of the docs
Codecov Report
Base: 98.26% // Head: 98.21% // Decreases project coverage by -0.05% :warning:
Coverage data is based on head (
2af6489) compared to base (255c421). Patch coverage: 94.44% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## master #931 +/- ##
==========================================
- Coverage 98.26% 98.21% -0.06%
==========================================
Files 231 232 +1
Lines 2597 2633 +36
==========================================
+ Hits 2552 2586 +34
- Misses 45 47 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| dateparser/find_date.py | 94.44% <94.44%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Many of the issues associated with the search_dates can be solved if this code is used to improve the current search_dates.
Like #930, #918, #921, #894 will be resolved.
Tests seem to be failing with Python 3.5 because was it introduced in Python 3.6 which provides a guarantee that keyword arguments are passed in the same order they appear in the code (from left to right)
PEP 468 -- Preserving the order of **kwargs in a function.
A possible solution for the issue, We may need to pass the input as a sequence of tuples to preserve ordering.
We can close this PR after merging https://github.com/scrapinghub/dateparser/pull/945