yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[BUG] mlflow sagemaker deploy fails to create endpoint

Open sigmarabi1 opened this issue 5 years ago • 8 comments

Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address. For help with debugging your code, please refer to Stack Overflow.

Please fill in this template and do not delete it unless you are sure your issue is outside its scope.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.15.3
  • MLflow installed from (source or binary):
  • MLflow version (run mlflow --version):1.6
  • Python version:3.8.1
  • npm version, if running the dev UI:
  • Exact command to reproduce:

Describe the problem

Sagemaker endpoint creation failed after running CLI command : mlflow sagemaker deploy -a $APP_NAME --model-uri $MODEL_URI -e $ROLE --region-name $REGION

(see logs from cloudwatch below)

Code to reproduce issue

export APP_NAME=keras-model-1 export MODEL_URI=/Users/sigmarabi1/Environments/R_Env/mlruns/0/5ec40745c50b4e7db12949d17328f74b/artifacts/keras_model export REGION=us-east-2 export ROLE=arn:aws:iam:::role/service-role/AmazonSageMaker-ExecutionRole-

mlflow sagemaker deploy -a $APP_NAME --model-uri $MODEL_URI -e $ROLE --region-name $REGION

Other info / logs

Timestamp Message
2020-02-27T00:37:35.875-05:00 UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404
-- --
2020-02-27T00:37:35.875-05:00 You will need to adjust your conda configuration to proceed.
-- --
2020-02-27T00:37:35.875-05:00 Use conda config --show channels to view your configuration's current state,
-- --
2020-02-27T00:37:35.876-05:00 and use conda config --show-sources to view config file locations.
-- --
2020-02-27T00:37:35.876-05:00 Traceback (most recent call last): File "", line 1, in File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 48, in _init _serve() File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 74, in _serve _serve_pyfunc(m) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 126, in _serve_pyfunc _install_pyfunc_deps(MODEL_PATH, install_mlflow=True) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 104, in _install_pyfunc_deps raise Exception("Failed to create model environment.")

sigmarabi1 avatar Feb 27 '20 06:02 sigmarabi1

@sigmarabi1 just to confirm - have you pushed a container up to aws first using mlflow sagemaker build-and-push-container? If so, you'll need to make sure you are referencing the image location with -i. Outlined here.

This looks like something a missing container might cause.

AdamBarnhard avatar Feb 27 '20 06:02 AdamBarnhard

@AdamBarnhard Yes, container was built prior to deploy command. Just ran it again with -i plus image url and got the same result. Also, python API was fails as well for me:

import mlflow.sagemaker as mfs mfs.deploy(app_name=app_name, model_uri=model_uri, region_name=region, mode="replace", execution_role_arn=arn, image_url=image_url)

sigmarabi1 avatar Feb 27 '20 13:02 sigmarabi1

@sigmarabi1 what does your mlflow.pyfunc.log_model(..) look like? Are you sending in a conda environment with that?

To debug, I'd suggest running mlflow sagemaker run-local until you get can get it to work. There should also be more logs when you run that. If the local version works, you might have to work with AWS directly to debug.

AdamBarnhard avatar Feb 27 '20 15:02 AdamBarnhard

@AdamBarnhard I cannot figure out what I'm missing.

when I run local: mlflow sagemaker run-local -m $MODEL_PATH -p $LOCAL_PORT

it fails in the following step:

docker run -v /Users/sigmarabi1/Environments/mlflowEnv/mlruns/0/795d02ab77c541c7b636900630a3839e/artifacts/keras_model_st_churn:/opt/ml/model/ -p 8888:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function --rm mlflow-pyfunc serve

Collecting package metadata (repodata.json): ...working... failed

UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404

You will need to adjust your conda configuration to proceed. Use conda config --show channels to view your configuration's current state, and use conda config --show-sources to view config file locations.

creating and activating custom environment Traceback (most recent call last): File "", line 1, in File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 48, in _init _serve() File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 74, in _serve _serve_pyfunc(m) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 126, in _serve_pyfunc _install_pyfunc_deps(MODEL_PATH, install_mlflow=True) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 104, in _install_pyfunc_deps raise Exception("Failed to create model environment.") Exception: Failed to create model environment.

sigmarabi1 avatar Feb 28 '20 07:02 sigmarabi1

@AdamBarnhard The model was logged with the R API as follows:

with (mlflow_start_run(), {

model_keras <- keras_model_sequential()

model_keras %>% layer_dense( units = 16, kernel_initializer = "uniform", activation = "relu", input_shape = ncol(x_train_tbl)) %>% layer_dropout(rate = 0.1) %>% layer_dense( units = 1, kernel_initializer = "uniform", activation = "sigmoid") %>% compile( optimizer = 'adam', loss = 'binary_crossentropy', metrics = c('accuracy') )

epochs <- mlflow_log_param("epochs", mlflow_param("epochs", 15)) batch_size <- mlflow_log_param("batch_size", mlflow_param("batch_size", 60))

history <- fit( object = model_keras, x = as.matrix(x_train_tbl), y = y_train_vec, batch_size = batch_size, epochs = epochs, validation_split = 0.2, callbacks = list(mlflow_logger) )

mlflow_log_model(model=model_keras,"keras_model_st_churn")

})

sigmarabi1 avatar Feb 28 '20 08:02 sigmarabi1

@AdamBarnhard I cannot figure out what I'm missing.

when I run local: mlflow sagemaker run-local -m $MODEL_PATH -p $LOCAL_PORT

it fails in the following step:

docker run -v /Users/sigmarabi1/Environments/mlflowEnv/mlruns/0/795d02ab77c541c7b636900630a3839e/artifacts/keras_model_st_churn:/opt/ml/model/ -p 8888:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function --rm mlflow-pyfunc serve

Collecting package metadata (repodata.json): ...working... failed

UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404

You will need to adjust your conda configuration to proceed. Use conda config --show channels to view your configuration's current state, and use conda config --show-sources to view config file locations.

creating and activating custom environment Traceback (most recent call last): File "", line 1, in File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 48, in _init _serve() File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 74, in _serve _serve_pyfunc(m) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 126, in _serve_pyfunc _install_pyfunc_deps(MODEL_PATH, install_mlflow=True) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 104, in _install_pyfunc_deps raise Exception("Failed to create model environment.") Exception: Failed to create model environment.

Hi @sigmarabi1, I am also facing trouble executing sagemaker local-run, but my error is the following(which seems like something you have resolved) can you help me resolve this issue(Model is XGBOOST)

Unable to find image 'mlflow-pyfunc:latest' locally
docker: Error response from daemon: pull access denied for mlflow-pyfunc, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

sai-krishna-msk avatar Apr 28 '21 12:04 sai-krishna-msk

Hi,

I got similar issue and it was a problem with conda, not mlflow. As I can tell you, it is probably typo in conda.yaml file because conda doesn't have channel d (as error states) but channel defaults.

As I said, I had a similar issue but it was related to the missing version of Python in conda. My venv operates on Python=3.8.9 and unfortunately this version was not available in conda; the solution was to change version of Python in conda.yaml file into Python 3.8.10. Hope this could help for someone digging here - if your serving fails within the miniconda script then it is very likely related to the conda itself.

SimonMolinsky avatar Feb 03 '22 20:02 SimonMolinsky

I'm also having this error, but I don't see how it has anything to do with conda

gegnew avatar Jan 31 '24 11:01 gegnew