dateparser
dateparser copied to clipboard
Fallback to the builtin re module when regex is not available
I'm deploying a project on a service that requires all the dependencies to be written in pure Python, C dependent libraries such as pandas are not supported.
I've noticed that regex
depends on C, but seems just to be a drop-in replacement for the built-in re
module. Does it make sense to catch an ImportError
and fall back to the built-in one?
I so wish dateparser
didn't depend on regex
for that very reason. I'm trying to deploy to AWS Lambda from OS X, and since there's no manylinux1
wheel of regex
, I can't depend on dateparser
unless I cross-compile regex
for Linux using Docker or something from OS X.
I've replaced the imports and run the tests, I've got sre_constants.error: look-behind requires fixed-width pattern
for RE_SANITIZE_PERIOD = re.compile(r'(?<=\D+)\.', flags=re.U)
in dateparser/date.py
.
Thanks for checking that @wrar. regex
was chosen not only because of its improved memory handling, but also for look-behind functions which classic re
does not support.
Fallback is still possible, but not without losing some functionality.
If I recall properly, there were plans to move regex
to core Python.
Would making regex
an optional dependency (i.e., following @juanriaza's suggestion) be acceptable? @wRAR, what tests were failing? What functionality would we lose without regex
installed?
@jmehnle unfortunately, the error is triggered just by importing dateparser.date
.
OTOH RE_SANITIZE_PERIOD
is used in dateparser.date.sanitize_date()
which is used in dateparser.date.DateDataParser.get_date_data()
which is probably the main function of dateparser
?
Any update on this issue ?
dateparser fits perfectly to my needs. I would like to use it in AWS Lambda, and the traditional way to package it is to use
pip install -t dependencies-folder -r requirements.txt
zip dependencies-folder/* zip-which-will-be-uploaded-to-aws-lambda.zip
Problem is that this installation is failing at installing regex, and I get this error :
Runtime.ImportModuleError: Unable to import module 'handler': No module named 'regex._regex'