yocto-gl
yocto-gl copied to clipboard
[BUG] mlflow sagemaker deploy fails to create endpoint
Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address. For help with debugging your code, please refer to Stack Overflow.
Please fill in this template and do not delete it unless you are sure your issue is outside its scope.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.15.3
- MLflow installed from (source or binary):
-
MLflow version (run
mlflow --version
):1.6 - Python version:3.8.1
- npm version, if running the dev UI:
- Exact command to reproduce:
Describe the problem
Sagemaker endpoint creation failed after running CLI command : mlflow sagemaker deploy -a $APP_NAME --model-uri $MODEL_URI -e $ROLE --region-name $REGION
(see logs from cloudwatch below)
Code to reproduce issue
export APP_NAME=keras-model-1 export MODEL_URI=/Users/sigmarabi1/Environments/R_Env/mlruns/0/5ec40745c50b4e7db12949d17328f74b/artifacts/keras_model export REGION=us-east-2 export ROLE=arn:aws:iam:::role/service-role/AmazonSageMaker-ExecutionRole-
mlflow sagemaker deploy -a $APP_NAME --model-uri $MODEL_URI -e $ROLE --region-name $REGION
Other info / logs
Timestamp | Message |
---|---|
2020-02-27T00:37:35.875-05:00 | UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404 |
-- | -- |
2020-02-27T00:37:35.875-05:00 | You will need to adjust your conda configuration to proceed. |
-- | -- |
2020-02-27T00:37:35.875-05:00 | Use conda config --show channels to view your configuration's current state, |
-- | -- |
2020-02-27T00:37:35.876-05:00 | and use conda config --show-sources to view config file locations. |
-- | -- |
2020-02-27T00:37:35.876-05:00 | Traceback (most recent call last): File " |
@sigmarabi1 just to confirm - have you pushed a container up to aws first using mlflow sagemaker build-and-push-container
? If so, you'll need to make sure you are referencing the image location with -i
. Outlined here.
This looks like something a missing container might cause.
@AdamBarnhard Yes, container was built prior to deploy command. Just ran it again with -i plus image url and got the same result. Also, python API was fails as well for me:
import mlflow.sagemaker as mfs mfs.deploy(app_name=app_name, model_uri=model_uri, region_name=region, mode="replace", execution_role_arn=arn, image_url=image_url)
@sigmarabi1 what does your mlflow.pyfunc.log_model(..)
look like? Are you sending in a conda environment with that?
To debug, I'd suggest running mlflow sagemaker run-local
until you get can get it to work. There should also be more logs when you run that. If the local version works, you might have to work with AWS directly to debug.
@AdamBarnhard I cannot figure out what I'm missing.
when I run local:
mlflow sagemaker run-local -m $MODEL_PATH -p $LOCAL_PORT
it fails in the following step:
docker run -v /Users/sigmarabi1/Environments/mlflowEnv/mlruns/0/795d02ab77c541c7b636900630a3839e/artifacts/keras_model_st_churn:/opt/ml/model/ -p 8888:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function --rm mlflow-pyfunc serve
Collecting package metadata (repodata.json): ...working... failed
UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404
You will need to adjust your conda configuration to proceed.
Use conda config --show channels
to view your configuration's current state,
and use conda config --show-sources
to view config file locations.
creating and activating custom environment
Traceback (most recent call last):
File "
@AdamBarnhard The model was logged with the R API as follows:
with (mlflow_start_run(), {
model_keras <- keras_model_sequential()
model_keras %>% layer_dense( units = 16, kernel_initializer = "uniform", activation = "relu", input_shape = ncol(x_train_tbl)) %>% layer_dropout(rate = 0.1) %>% layer_dense( units = 1, kernel_initializer = "uniform", activation = "sigmoid") %>% compile( optimizer = 'adam', loss = 'binary_crossentropy', metrics = c('accuracy') )
epochs <- mlflow_log_param("epochs", mlflow_param("epochs", 15)) batch_size <- mlflow_log_param("batch_size", mlflow_param("batch_size", 60))
history <- fit( object = model_keras, x = as.matrix(x_train_tbl), y = y_train_vec, batch_size = batch_size, epochs = epochs, validation_split = 0.2, callbacks = list(mlflow_logger) )
mlflow_log_model(model=model_keras,"keras_model_st_churn")
})
@AdamBarnhard I cannot figure out what I'm missing.
when I run local:
mlflow sagemaker run-local -m $MODEL_PATH -p $LOCAL_PORT
it fails in the following step:
docker run -v /Users/sigmarabi1/Environments/mlflowEnv/mlruns/0/795d02ab77c541c7b636900630a3839e/artifacts/keras_model_st_churn:/opt/ml/model/ -p 8888:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function --rm mlflow-pyfunc serve
Collecting package metadata (repodata.json): ...working... failed
UnavailableInvalidChannel: The channel is not accessible or is invalid. channel name: d channel url: https://conda.anaconda.org/d error code: 404
You will need to adjust your conda configuration to proceed. Use
conda config --show channels
to view your configuration's current state, and useconda config --show-sources
to view config file locations.creating and activating custom environment Traceback (most recent call last): File "", line 1, in File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 48, in _init _serve() File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 74, in _serve _serve_pyfunc(m) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 126, in _serve_pyfunc _install_pyfunc_deps(MODEL_PATH, install_mlflow=True) File "/miniconda/lib/python3.7/site-packages/mlflow/models/container/init.py", line 104, in _install_pyfunc_deps raise Exception("Failed to create model environment.") Exception: Failed to create model environment.
Hi @sigmarabi1, I am also facing trouble executing sagemaker local-run, but my error is the following(which seems like something you have resolved) can you help me resolve this issue(Model is XGBOOST)
Unable to find image 'mlflow-pyfunc:latest' locally
docker: Error response from daemon: pull access denied for mlflow-pyfunc, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
Hi,
I got similar issue and it was a problem with conda
, not mlflow
. As I can tell you, it is probably typo in conda.yaml
file because conda doesn't have channel d (as error states) but channel defaults.
As I said, I had a similar issue but it was related to the missing version of Python in conda. My venv
operates on Python=3.8.9 and unfortunately this version was not available in conda; the solution was to change version of Python in conda.yaml
file into Python 3.8.10. Hope this could help for someone digging here - if your serving fails within the miniconda script then it is very likely related to the conda itself.
I'm also having this error, but I don't see how it has anything to do with conda