sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Multi-model endpoint workers die when sklearn entrypoint imports package installed with requirements.txt

Open gavinmh opened this issue 5 years ago • 7 comments

Multi-model endpoint workers die when the entry point imports a package installed through requirements.txt. The package is installed successfully and the endpoint is created successfully, but inference requests always fail.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Worker died."
}

To reproduce Include a requirements.txt in the source_dir and import the installed package in the entry point script or the model_fn.

https://gist.github.com/gavinmh/267bc34ddedaf0931151a901859e165d changes the sklearn_multi_model_endpoint_home_value.ipynb example notebook.

In particular, it adds:

%%writefile $SOURCE_DIR/requirements.txt

shap

Expected behavior shap is imported.

Screenshots or logs

image

image

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.3.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): sklearn
  • Framework version: 0.23-1
  • Python version: 3.7
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

gavinmh avatar Aug 14 '20 16:08 gavinmh

Hello @gavinmh

Thank you for using Amazon SageMaker. We are looking into your issue. Will get back to you with an update by 2020-08-19 17:00 Pacific time.

Best regards

metrizable avatar Aug 16 '20 02:08 metrizable

Do you have any updates to share @metrizable ?

gavinmh avatar Aug 20 '20 17:08 gavinmh

Hi @gavinmh, sorry for the delay. We're passing this along to the team that maintains the scikit-learn container.

ajaykarpur avatar Aug 20 '20 22:08 ajaykarpur

@edwardjkim Would you be able to take a look?

ajaykarpur avatar Aug 20 '20 22:08 ajaykarpur

Any updates @edwardjkim ?

gavinmh avatar Aug 24 '20 20:08 gavinmh

HI @gavinmh, did the endpoint run successfully when it was deployed without installing requirements.txt? It looks like you are modifying the scikit-learn MME notebook which to my knowledge does not work with Python SDK 2.0. Could you try again by fixing the Python SDK version to pip install sagemaker==1.* (and possibly restarting the kernel)?

@fm1ch4 is the author of the notebook from the MME team. @fm1ch4 Can you please take a look?

edwardjkim avatar Aug 25 '20 04:08 edwardjkim

Hi @gavinmh - can you please confirm if this issue still exists with the latest sagemaker ?

akrishna1995 avatar Dec 28 '23 19:12 akrishna1995