azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

Deployment to Managed endpoint is crashing

Open sujithaleshwaram99 opened this issue 3 years ago • 2 comments
trafficstars

Describe the bug A clear and concise description of what the bug is. I have tried to deploy model to managed endpoint using SDK V2 I have given all the required configuration settings i.e. Docker Image, Model pickle file and Scoring file but while provisioning deployment it crashing. Please help me fix it. thank you. Exception or Stack Trace Add the exception log and stack trace if available

To Reproduce Steps to reproduce the behavior:

Code Snippet Add the code snippet that causes the issue. model = Model(path="./Data-Science/Trained_Model/model.pkl") env = Environment( conda_file="./Data-Science/Trained_Model/conda.yaml", image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:20220616.v1", )

blue_deployment = ManagedOnlineDeployment( name="d-one", endpoint_name=online_endpoint_name, model=model, environment=env, code_configuration=CodeConfiguration( code="./Data-Science/scoring", scoring_script="score.py" ), instance_type="Standard_F4s_v2", instance_count=1, ) ml_client.online_deployments.begin_create_or_update(deployment=blue_deployment) Expected behavior A clear and concise description of what you expected to happen. Deployment Successful. Screenshots If applicable, add screenshots to help explain your problem. Instance status: SystemSetup: Succeeded UserContainerImagePull: Succeeded ModelDownload: Succeeded UserContainerStart: InProgress

Container events: Kind: Pod, Name: Pulling, Type: Normal, Time: 2022-09-21T13:12:04.810312Z, Message: Start pulling container image Kind: Pod, Name: Downloading, Type: Normal, Time: 2022-09-21T13:12:04.857318Z, Message: Start downloading models Kind: Pod, Name: Pulled, Type: Normal, Time: 2022-09-21T13:13:46.824143Z, Message: Container image is pulled successfully Kind: Pod, Name: Downloaded, Type: Normal, Time: 2022-09-21T13:13:46.824143Z, Message: Models are downloaded successfully Kind: Pod, Name: Created, Type: Normal, Time: 2022-09-21T13:13:46.973159Z, Message: Created container inference-server Kind: Pod, Name: Started, Type: Normal, Time: 2022-09-21T13:13:47.06337Z, Message: Started container inference-server

Container logs: /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) 2022-09-21T13:13:47,058798056+00:00 - gunicorn/run 2022-09-21T13:13:47,060153990+00:00 - rsyslog/run 2022-09-21T13:13:47,060869707+00:00 | gunicorn/run | bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by bash) 2022-09-21T13:13:47,062360944+00:00 | gunicorn/run | ############################################### 2022-09-21T13:13:47,062846656+00:00 - iot-server/run 2022-09-21T13:13:47,064079287+00:00 | gunicorn/run | AzureML Container Runtime Information 2022-09-21T13:13:47,064829705+00:00 - nginx/run 2022-09-21T13:13:47,065537123+00:00 | gunicorn/run | ############################################### 2022-09-21T13:13:47,066757553+00:00 | gunicorn/run | 2022-09-21T13:13:47,068248890+00:00 | gunicorn/run | 2022-09-21T13:13:47,070585448+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20220616.v11 2022-09-21T13:13:47,072696200+00:00 | gunicorn/run | 2022-09-21T13:13:47,075219262+00:00 | gunicorn/run | rsyslogd: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libuuid.so.1: no version information available (required by rsyslogd) 2022-09-21T13:13:47,079050357+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-09-21T13:13:47,087153358+00:00 | gunicorn/run | PYTHONPATH environment variable: 2022-09-21T13:13:47,091169057+00:00 | gunicorn/run | 2022-09-21T13:13:47,092708895+00:00 | gunicorn/run | Pip Dependencies (before dynamic installation)

EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting... /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) 2022-09-21T13:13:47,135979965+00:00 - iot-server/finish 1 0 2022-09-21T13:13:47,137118693+00:00 - Exit code 1 is normal. Not restarting iot-server. adal==1.2.7 alembic==1.8.1 applicationinsights==0.11.10 argcomplete==2.0.0 arviz==0.11.2 attrs==22.1.0 azure-common==1.1.28 azure-core==1.24.2 azure-graphrbac==0.61.1 azure-identity==1.10.0 azure-mgmt-authorization==2.0.0 azure-mgmt-containerregistry==10.0.0 azure-mgmt-core==1.3.1 azure-mgmt-keyvault==10.0.0 azure-mgmt-resource==21.1.0 azure-mgmt-storage==20.0.0 azure-storage-blob==12.9.0 azure-storage-queue==12.4.0 azureml-automl-core==1.44.0 azureml-automl-runtime==1.44.0 azureml-core==1.44.0 azureml-dataprep==4.2.2 azureml-dataprep-native==38.0.0 azureml-dataprep-rslex==2.8.1 azureml-dataset-runtime==1.44.0 azureml-defaults==1.44.0 azureml-inference-server-http==0.7.4 azureml-interpret==1.44.0 azureml-mlflow==1.44.0 azureml-pipeline-core==1.44.0 azureml-responsibleai==1.44.0 azureml-telemetry==1.44.0 azureml-train-automl-client==1.44.0 azureml-train-automl-runtime==1.44.0 azureml-train-core==1.44.0 azureml-train-restclients-hyperdrive==1.44.0 azureml-training-tabular==1.44.0 backcall==0.2.0 backports.tempfile==1.0 backports.weakref==1.0.post1 bcrypt==3.2.2 bokeh==2.4.3 boto==2.49.0 boto3==1.15.18 botocore==1.18.18 cachetools==5.2.0 certifi @ file:///opt/conda/conda-bld/certifi_1655968806487/work/certifi cffi==1.15.1 cftime==1.5.1.1 charset-normalizer==2.1.0 click==7.1.2 cloudpickle==1.6.0 cmdstanpy==0.9.5 configparser==3.7.4 contextlib2==21.6.0 convertdate @ file:///tmp/build/80754af9/convertdate_1634070773133/work cryptography==37.0.4 cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work Cython==0.29.28 dask==2.30.0 databricks-cli==0.17.0 dataclasses==0.6 debugpy==1.6.2 decorator==5.1.1 dice-ml==0.8 dill==0.3.5.1 distributed==2.30.1 distro==1.7.0 docker==5.0.3 dotnetcore2==3.1.23 dowhy==0.8 econml==0.12.0 entrypoints==0.4 ephem @ file:///tmp/build/80754af9/ephem_1638960312619/work erroranalysis==0.3.6 fairlearn==0.7.0 fbprophet @ file:///home/conda/feedstock_root/build_artifacts/fbprophet_1599365534439/work fire==0.4.0 Flask==2.1.3 Flask-Cors==3.0.10 flatbuffers==2.0 fonttools==4.25.0 fsspec==2022.7.1 gensim==3.8.3 gitdb==4.0.9 GitPython==3.1.27 google-api-core==2.8.2 google-auth==2.9.1 googleapis-common-protos==1.56.4 greenlet==1.1.3 gunicorn==20.1.0 h5py==3.7.0 HeapDict==1.0.1 holidays @ file:///home/conda/feedstock_root/build_artifacts/holidays_1595448845196/work humanfriendly==10.0 idna==3.3 importlib-metadata==4.12.0 importlib-resources==5.9.0 inference-schema==1.4.2.1 interpret-community==0.26.0 interpret-core==0.2.7 ipykernel==6.6.0 ipython==7.34.0 isodate==0.6.1 itsdangerous==2.1.2 jedi==0.18.1 jeepney==0.8.0 Jinja2==2.11.2 jmespath==1.0.0 joblib==0.14.1 json-logging-py==0.2 jsonpickle==2.2.0 jsonschema==4.9.1 jupyter-client==7.3.4 jupyter-core==4.11.1 keras2onnx==1.6.0 kiwisolver @ file:///opt/conda/conda-bld/kiwisolver_1653292039266/work knack==0.9.0 korean-lunar-calendar @ file:///tmp/build/80754af9/korean_lunar_calendar_1634063020401/work lightgbm==3.2.1 llvmlite==0.38.1 locket==1.0.0 LunarCalendar @ file:///tmp/build/80754af9/lunarcalendar_1646383991234/work Mako==1.2.2 MarkupSafe==2.0.1 matplotlib @ file:///tmp/build/80754af9/matplotlib-suite_1634667019719/work matplotlib-inline==0.1.3 mkl-fft==1.3.0 mkl-random==1.1.0 mkl-service==2.3.0 ml-wrappers==0.2.0 mlflow==1.29.0 mlflow-skinny==1.27.0 mpi4py==3.1.3 mpmath==1.2.1 msal==1.18.0 msal-extensions==1.0.0 msgpack==1.0.4 msrest==0.7.1 msrestazure==0.6.4 munkres==1.1.4 ndg-httpsclient==0.5.1 nest-asyncio==1.5.5 netCDF4==1.5.7 networkx==2.5 nimbusml==1.8.0 numba==0.55.2 numpy==1.18.5 oauthlib==3.2.0 onnx==1.12.0 onnxconverter-common==1.6.0 onnxmltools==1.4.1 onnxruntime==1.11.1 opencensus==0.10.0 opencensus-context==0.1.2 opencensus-ext-azure==1.1.5 packaging @ file:///tmp/build/80754af9/packaging_1637314298585/work pandas==1.1.5 paramiko==2.11.0 parso==0.8.3 partd==1.2.0 pathspec==0.9.0 patsy==0.5.2 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.0.1 pkginfo==1.8.3 pkgutil-resolve-name==1.3.10 ply==3.11 pmdarima==1.7.1 portalocker==2.5.1 prometheus-client==0.14.1 prometheus-flask-exporter==0.20.3 prompt-toolkit==3.0.30 protobuf==3.20.1 psutil @ file:///tmp/build/80754af9/psutil_1612298016854/work ptyprocess==0.7.0 py-cpuinfo==5.0.0 pyarrow==6.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydot==1.4.2 Pygments==2.12.0 PyJWT==2.4.0 PyMeeus @ file:///tmp/build/80754af9/pymeeus_1634069098549/work PyNaCl==1.5.0 pyOpenSSL==22.0.0 pyparsing @ file:///tmp/build/80754af9/pyparsing_1635766073266/work PyQt5-sip==12.11.0 pyrsistent==0.18.1 PySocks==1.7.1 pystan==2.19.1.1 python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work pytz @ file:///opt/conda/conda-bld/pytz_1654762638606/work PyYAML==6.0 pyzmq==23.2.0 querystring-parser==1.2.4 raiutils==0.2.0 requests==2.28.1 requests-oauthlib==1.3.1 responsibleai==0.19.0 rsa==4.9 s3transfer==0.3.7 scikit-learn==0.22.1 scipy==1.5.3 SecretStorage==3.3.2 semver==2.13.0 setuptools-git==1.2 shap==0.39.0 sip @ file:///tmp/abs_44cd77b_pu/croots/recipe/sip_1659012365470/work six @ file:///tmp/build/80754af9/six_1644875935023/work skl2onnx==1.4.9 sklearn-pandas==1.7.0 slicer==0.0.7 smart-open==1.9.0 smmap==5.0.0 sortedcontainers==2.4.0 sparse==0.13.0 SQLAlchemy==1.4.41 sqlparse==0.4.2 statsmodels==0.11.1 sympy==1.10.1 tabulate==0.8.10 tblib==1.7.0 termcolor==1.1.0 toml @ file:///tmp/build/80754af9/toml_1616166611790/work toolz==0.12.0 tornado @ file:///tmp/build/80754af9/tornado_1606942283357/work tqdm @ file:///opt/conda/conda-bld/tqdm_1650891076910/work traitlets==5.3.0 typing-extensions==4.1.1 urllib3==1.26.9 wcwidth==0.2.5 websocket-client==1.3.3 Werkzeug==2.2.1 wrapt==1.12.1 xarray==0.20.1 xgboost==1.3.3 zict==2.2.0 zipp==3.8.1

2022-09-21T13:13:47,482009924+00:00 | gunicorn/run | 2022-09-21T13:13:47,483196753+00:00 | gunicorn/run | ############################################### 2022-09-21T13:13:47,484262079+00:00 | gunicorn/run | AzureML Inference Server 2022-09-21T13:13:47,485307705+00:00 | gunicorn/run | ############################################### 2022-09-21T13:13:47,486356731+00:00 | gunicorn/run | 2022-09-21T13:13:47,487760366+00:00 | gunicorn/run | 2022-09-21T13:13:47,488771491+00:00 | gunicorn/run | Starting HTTP server 2022-09-21T13:13:47,489786616+00:00 | gunicorn/run | Starting gunicorn 20.1.0 Listening at: http://127.0.0.1:31311 (13) Using worker: sync worker timeout is set to 300 Booting worker with pid: 64 SPARK_HOME not set. Skipping PySpark Initialization. Initializing logger 2022-09-21 13:13:48,238 | root | INFO | Starting up app insights client logging socket was found. logging is available. logging socket was found. logging is available. 2022-09-21 13:13:48,238 | root | INFO | Starting up request id generator 2022-09-21 13:13:48,238 | root | INFO | Starting up app insight hooks 2022-09-21 13:13:48,238 | root | INFO | Invoking user's init function 2022-09-21 13:13:50,198 | azureml.core | WARNING | Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (azure-identity 1.10.0 (/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages), Requirement.parse('azure-identity==1.7.0'), {'azureml-dataprep'}). Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (azure-identity 1.10.0 (/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages), Requirement.parse('azure-identity==1.7.0'), {'azureml-dataprep'}). 00000000-0000-0000-0000-000000000000,RuntimeError 00000000-0000-0000-0000-000000000000,: 00000000-0000-0000-0000-000000000000,module compiled against API version 0xe but this version of numpy is 0xd 00000000-0000-0000-0000-000000000000,ImportError 00000000-0000-0000-0000-000000000000,: 00000000-0000-0000-0000-000000000000,numpy.core.multiarray failed to import 00000000-0000-0000-0000-000000000000, The above exception was the direct cause of the following exception: 00000000-0000-0000-0000-000000000000,SystemError 00000000-0000-0000-0000-000000000000,: 00000000-0000-0000-0000-000000000000, returned a result with an error set 2022-09-21 13:13:50,861 | root | INFO | Users's init has completed successfully 2022-09-21 13:13:50,863 | root | INFO | Skipping middleware: dbg_model_info as it's not enabled. 2022-09-21 13:13:50,863 | root | INFO | Skipping middleware: dbg_resource_usage as it's not enabled. Generating swagger file: /tmp/tmp71mgbspf 2022-09-21 13:13:50,865 | root | INFO | Scoring timeout setting is not found. Use default timeout: 3600000 ms Exception in worker process Traceback (most recent call last): File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/var/azureml-server/entry.py", line 3, in app = create_app.create() File "/var/azureml-server/create_app.py", line 29, in create app.register_blueprint(main) File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/flask/scaffold.py", line 50, in wrapper_func return f(self, *args, **kwargs) File "/azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/python3.7/site-packages/flask/app.py", line 1022, in register_blueprint blueprint.register(self, options) File "/var/azureml-server/aml_blueprint.py", line 287, in register super(AMLBlueprint, self).register(app, options, first_registration) TypeError: register() takes 3 positional arguments but 4 were given Worker exiting (pid: 64) Shutting down: Master Reason: Worker failed to boot. /bin/bash: /azureml-envs/azureml_c8c00a9938a621a73a96bb638c5bb085/lib/libtinfo.so.6: no version information available (required by /bin/bash) 2022-09-21T13:13:51,352103193+00:00 - gunicorn/finish 3 0 2022-09-21T13:13:51,353207918+00:00 - Exit code 3 is not normal. Killing image.

Setup (please complete the following information):

  • Python Version: Python 3.8
  • SDK Version: V2

Additional context Add any other context about the problem here.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [ ] Bug Description Added
  • [ ] Repro Steps Added
  • [ ] Setup information Added

sujithaleshwaram99 avatar Sep 21 '22 15:09 sujithaleshwaram99

Thank you for your feedback. This has been routed to the support team for assistance.

ghost avatar Sep 21 '22 17:09 ghost

Hi @sujithaleshwaram99 thank you for the feedback, I will forward this to @azureml-github for answers.

l0lawrence avatar Sep 21 '22 17:09 l0lawrence

Hi @l0lawrence thanks for chasing it up. Any updates on this?

HammadKamran avatar Sep 30 '22 10:09 HammadKamran

Hello,

Thanks for reaching out. At first glance, it seems like your scoring script is using AzureML SDK v1. Could you clarify why you need that and/or share your script?

For your reference, this a sample scoring script used with SDK v2. https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/model-1/onlinescoring/score.py

Thanks,

Hugo

hugoaponte avatar Oct 03 '22 20:10 hugoaponte

Hi @hugoaponte,

I have used below script in score.py.

code snippet: from azureml.core.model import Model import os import joblib import json import logging import numpy import pandas as pd

def init(): """ This function is called when the container is initialized/started, typically after create/update of the deployment. You can write the logic here to perform init operations like caching the model in memory """ global model # AZUREML_MODEL_DIR is an environment variable created during deployment. # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION) # Please provide your model's folder name if there is one model_path = Model.get_model_path(model_name="credit-ci") model_path = os.path.join(model_path, "model.pkl") model = joblib.load(model_path) logging.info("Init complete")

def run(raw_data): """ This function is called for every invocation of the endpoint to perform the actual scoring/prediction. In the example we extract the data from the json input and call the scikit-learn model's predict() method and return the result back """ logging.info("model 1: request received") #data = json.loads(raw_data) # data = numpy.array(data) # result = model.predict(data) with open(raw_data, 'r', encoding='utf-8') as f: data = json.load(f) df = pd.DataFrame.from_dict(data) sno = df["Sno"] df = df.drop("Sno", axis=1) proba = model.predict_proba(df) proba = pd.DataFrame(data=proba, columns=["ProbaGoodCredit", "ProbaBadCredit"]) result=proba.to_json(orient="records")

logging.info("Request processed")
return result

sujithaleshwaram99 avatar Oct 05 '22 09:10 sujithaleshwaram99

Thanks, To deploy with SDK V2, you don't need SDK v1 in your scoring script anymore. The example I shared shows the way to load the model:

    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # Please provide your model's folder name if there is one
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "model.pkl"
    )
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

Could you try using this approach?

Thanks,

Hugo

hugoaponte avatar Oct 05 '22 16:10 hugoaponte

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

anandwana001 avatar Oct 15 '22 08:10 anandwana001

Hi @sujithaleshwaram99. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost avatar Oct 20 '22 03:10 ghost

Hi @sujithaleshwaram99, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.

ghost avatar Oct 27 '22 10:10 ghost