dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Fallback to the builtin re module when regex is not available

Open juanriaza opened this issue 5 years ago • 6 comments

I'm deploying a project on a service that requires all the dependencies to be written in pure Python, C dependent libraries such as pandas are not supported.

I've noticed that regex depends on C, but seems just to be a drop-in replacement for the built-in re module. Does it make sense to catch an ImportError and fall back to the built-in one?

juanriaza avatar Aug 28 '18 12:08 juanriaza

I so wish dateparser didn't depend on regex for that very reason. I'm trying to deploy to AWS Lambda from OS X, and since there's no manylinux1 wheel of regex, I can't depend on dateparser unless I cross-compile regex for Linux using Docker or something from OS X.

jmehnle avatar Sep 12 '18 17:09 jmehnle

I've replaced the imports and run the tests, I've got sre_constants.error: look-behind requires fixed-width pattern for RE_SANITIZE_PERIOD = re.compile(r'(?<=\D+)\.', flags=re.U) in dateparser/date.py.

wRAR avatar Sep 19 '18 10:09 wRAR

Thanks for checking that @wrar. regex was chosen not only because of its improved memory handling, but also for look-behind functions which classic re does not support. Fallback is still possible, but not without losing some functionality. If I recall properly, there were plans to move regex to core Python.

asadurski avatar Sep 19 '18 10:09 asadurski

Would making regex an optional dependency (i.e., following @juanriaza's suggestion) be acceptable? @wRAR, what tests were failing? What functionality would we lose without regex installed?

jmehnle avatar Sep 19 '18 16:09 jmehnle

@jmehnle unfortunately, the error is triggered just by importing dateparser.date.

OTOH RE_SANITIZE_PERIOD is used in dateparser.date.sanitize_date() which is used in dateparser.date.DateDataParser.get_date_data() which is probably the main function of dateparser?

wRAR avatar Sep 19 '18 16:09 wRAR

Any update on this issue ?

dateparser fits perfectly to my needs. I would like to use it in AWS Lambda, and the traditional way to package it is to use

pip install -t dependencies-folder -r requirements.txt
zip dependencies-folder/* zip-which-will-be-uploaded-to-aws-lambda.zip

Problem is that this installation is failing at installing regex, and I get this error :

Runtime.ImportModuleError: Unable to import module 'handler': No module named 'regex._regex'

thinow avatar Aug 12 '21 15:08 thinow