elasticsearch-py icon indicating copy to clipboard operation
elasticsearch-py copied to clipboard

Function not found EXCEPTION: [Errno 38] when using helpers.parallel_bulk in aws lambda

Open Bryson14 opened this issue 2 years ago • 3 comments

Describe the feature:

Elasticsearch version (bin/elasticsearch --version):

elasticsearch-py version (elasticsearch.__versionstr__): elasticsearch==7.19.9 Please make sure the major version matches the Elasticsearch server you are running.

Description of the problem including expected versus actual behavior: We were using elasticsearch helper parallel_bulk inside of a AWS lambda function. We were running with the same version of elasticsearch, but on python runtime 3.7.

Now that we upgraded to python runtime 3.11, I get this error when it tries to execute parallel_bulk:

EXCEPTION: [Errno 38] Function not implemented It might be because lambda doesn't allow for the use of some of the python multiprocessing pacakge.

  • https://stackoverflow.com/questions/34005930/multiprocessing-semlock-is-not-implemented-when-running-on-aws-lambda
  • https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/
  • https://stackoverflow.com/questions/60816172/numba-issues-multiprocessing-userwarning-when-running-in-aws-lambda

Steps to reproduce: Try uploading document to elasticsearch from AWS lambda python runtime 3.11 using the parallel_bulk helper

Bryson14 avatar Sep 28 '23 21:09 Bryson14

Thanks for the report. We may want to try to make this work on AWS Lambda, but I'm confused, how could this work on Python 3.7 since multiprocessing.pool.ThreadPool does not work on AWS Lambda?

Also, can you please share the full exception/traceback?

pquentin avatar Oct 02 '23 06:10 pquentin

Closing, but I'll reopen if I get more details. Thank you!

pquentin avatar Nov 30 '23 12:11 pquentin

Got another report that this indeed fails starting with Python 3.8, and the links above give possible workarounds to support AWS Lambda. I also now understand that the reason it works on Python 3.7: Python 3.8 and above use SemLock which isn't supported by AWS Lambda.

We still want to use the faster ThreadPool when possible, but fallback to Pipe when it's not available.

This is very unlikely to be backported to elasticsearch-py 7.x, but should be available in a later elasticsearch-py 8.x version. (The migration path is easier now, with changes to the body parameter that went in 8.12.)

pquentin avatar Mar 12 '24 05:03 pquentin