commonregex-improved
commonregex-improved copied to clipboard
Cases where the pattern is part of a word
Checklist
- [x] There are no similar reports on existing issues (including closed ones).
- [x] I was in the
masterbranch of the latest code.
Is your feature request related to a problem? Please describe
I don't know if it's a desired behaviour, but the package doesn't consider patterns that are part of a word. E.g., the code:
CommonRegex.dates("The spyware Trojan 12")
returns ['jan 12']. I would expect an empty list instead.
Describe the solution you'd like
I would expect an empty list (in the example above), given that "jan" is part of the token "Trojan".
Hey @luizvbo , yeah it's actually not a desired behaviour and seems to have the same result in original commonregex package too. Will need sometime to figure out the proper regex for it. In the mean time if you need to parse dates, I would suggest using the date function from python standard library.