seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

REST requests do not work with AES / Istio Ingress, but work with Emissary Ingress

Open martysai opened this issue 2 years ago • 6 comments

Describe the bug

Hi. I've been using Seldon Core for a while, but still confused with a proper Ingress set up. Despite following (1) the set up tutorial and (2) the customization of the LightGBM server, it still was too complicated to make proper REST requests to the deployed iris classifier because of the Ingress.

As Ingress I tried all of 3 options provided by Seldon Core:

  1. Istio Ingress;
  2. Ambassador Edge Stack;
  3. Emissary Ingress (a.k.a. Ambassador API Gateway).

But only the last one worked out. Here I try to explain the problems which I faced with two former options of the Ingress. I would like to know what was wrong in my actions. Probably, there are real bugs which should be fixed in Seldon Core.

To reproduce

Istio

Here's the command which I used to set up Seldon Core with Istio:

helm install seldon-core seldon-core-operator \
    --repo https://storage.googleapis.com/seldon-charts \
    --set usageMetrics.enabled=true \
    --set istio.enabled=true \
    --namespace seldon-system

Then, I deployed iris model which is described below, but faced the problem with seldon-controller-manager pod. I stopped here and switched to AES.

Ambassador Edge Stack

  1. At first, I've written the following Makefile:
# Environment variables
SELDON_NAMESPACE ?= seldon-system
SERVER_VALUES_YAML ?= lightgbm.yaml
BUILD_DIRECTORY ?= .
NODE_NAME ?= node
REGISTRY_NAME ?= node
AMBASSADOR_NAMESPACE ?= ambassador


# Setting up the Seldon Core services
.PHONY: install-seldon
install-seldon:
	kubectl create namespace ${SELDON_NAMESPACE} || true
	helm install seldon-core seldon-core-operator \
		--repo https://storage.googleapis.com/seldon-charts \
		--set ambassador.enabled=true \
		--set usageMetrics.enabled=true \
		--namespace ${SELDON_NAMESPACE}


# Setting up the Ambassador
.PHONY: install-ambassador
install-ambassador:
	kubectl create namespace ${AMBASSADOR_NAMESPACE} || true
	helm repo add datawire https://www.getambassador.io
	helm repo update
	helm install ambassador datawire/ambassador \
		--set image.repository=docker.io/datawire/ambassador \
		--set crds.keep=false \
		--set enableAES=false \
		--set replicaCount=1 \
		--namespace ${AMBASSADOR_NAMESPACE}

# Upgrade the seldon-core servers list via Helm
.PHONY: upgrade-servers-list
upgrade-servers-list:
	helm upgrade seldon-core  \
		../helm-charts/seldon-core-operator \
		--namespace ${SELDON_NAMESPACE} \
		--values k8s/servers/${SERVER_VALUES_YAML}
  1. Then, I've run the following instructions (environment variables lead to K8S manifests with SeldonDeployment CRD and other stuff):
make install-seldon
make install-ambassador
make upgrade-servers-list
  1. Finally, I deployed the iris model by using the manifest from the documentation:
  • Manifest:
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: iris
spec:
  predictors:
  - graph:
      implementation: LIGHTGBM_SERVER
      modelUri: gs://seldon-models/lightgbm/iris/model_3.2.1
      name: irislgbm
    name: default
    replicas: 1
  • Deployment creation:
kubectl create -f iris.yaml
  1. But when I tried to make a curl request to the iris deployment via ambassador service, I got stuck due to TLS issue which could not be fixed even after doing steps described in Ambassador documentation:
~$ curl -k -d '{"data": {"ndarray":[[1.0, 2.0, 3.0, 4.0]]}}' -X POST https://${AMBASSADOR_IP}/seldon/seldon/iris/api/v1.0/predictions -H "Content-Type: application/json"
curl: (7) Failed to connect to ${AMBASSADOR_IP} port 443 after 21 ms: Connection refused

Expected behaviour

Emissary Ingress

The pipeline above worked out when I added the argument --set enableAES=false to Ambassador installation command with helm. Then, I added the TLS accordingly to Ambassador documentation in order to deal with https requests. Finally, I was able to use predictions of iris classifier:

~$ curl -k -d '{"data": {"ndarray":[[1.0, 2.0, 3.0, 4.0]]}}' -X POST https://${AMBASSADOR_IP}/seldon/seldon/iris/api/v1.0/predictions -H "Content-Type: application/json"
{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.006050583983476063,0.0036412248390291302,0.9903081911774948]]},"meta":{"requestPath":{"irislgbm":"msaidov/lightgbm:1.15.0-dev"}}}

I do expect that Ambassador Edge Stack and Istio Ingress should behave in the same way. Please, provide an explanation of how to reach it or fix the appeared issue. Thank you beforehand!

Environment

  • Cloud Provider: Kubernetes in Yandex Cloud
  • Kubernetes Cluster Version
~$ kubectl version --short
Client Version: v1.24.1
Kustomize Version: v4.5.4
Server Version: v1.21.5
  • Deployed Seldon System Images:
~$ kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml  | grep seldonio 
          value: docker.io/seldonio/seldon-core-executor:1.15.0-dev
        image: docker.io/seldonio/seldon-core-operator:1.15.0-dev

Model Details

  • Images of your model:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  creationTimestamp: "2022-07-14T10:55:53Z"
  generation: 1
  name: iris
  namespace: seldon
  resourceVersion: "33155936"
  uid: 16a9b7b1-9a6d-44ac-acca-fe47090f2992
spec:
  predictors:
  - graph:
      implementation: LIGHTGBM_SERVER
      modelUri: gs://seldon-models/lightgbm/iris/model_3.2.1
      name: irislgbm
    name: default
    replicas: 1
status:
  address:
    url: http://iris-default.seldon.svc.cluster.local:8000/api/v1.0/predictions
  conditions:
  - lastTransitionTime: "2022-07-14T10:56:19Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: DeploymentsReady
  - lastTransitionTime: "2022-07-14T10:55:54Z"
    reason: No HPAs defined
    status: "True"
    type: HpasReady
  - lastTransitionTime: "2022-07-14T10:55:54Z"
    reason: No KEDA resources defined
    status: "True"
    type: KedaReady
  - lastTransitionTime: "2022-07-14T10:55:54Z"
    reason: No PDBs defined
    status: "True"
    type: PdbsReady
  - lastTransitionTime: "2022-07-14T10:56:19Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-07-14T10:56:19Z"
    reason: All services created
    status: "True"
    type: ServicesReady
  - lastTransitionTime: "2022-07-14T10:56:19Z"
    reason: No VirtualServices defined
    status: "True"
    type: istioVirtualServicesReady
  deploymentStatus:
    iris-default-0-irislgbm:
      availableReplicas: 1
      replicas: 1
  replicas: 1
  state: Available
  • Logs of your model:
lost those logs but it was a classical 403 error due to connection refused

martysai avatar Jul 14 '22 15:07 martysai

Can you explain further the issue with using istio and which version.

For Ambassador we only support V1 of their APIs presently.

ukclivecox avatar Jul 18 '22 05:07 ukclivecox

For Ambassador we only support V1 of their APIs presently.

@cliveseldon Does it mean that Edge Stack is no longer supported?

martysai avatar Jul 18 '22 09:07 martysai

We are looking to support it via an update which is in progress for next release. However, this is still in planning and intial implementation stage at present.

ukclivecox avatar Jul 18 '22 09:07 ukclivecox

@cliveseldon Do you recommend to use Istio or Emissary in the production environment?

Can you explain further the issue with using istio and which version.

I will provide the details about this note a bit later.

martysai avatar Jul 18 '22 09:07 martysai

@MaratSaidov istio would be the way to go for production

axsaucedo avatar Jul 27 '22 14:07 axsaucedo

Just looking at the scripts, it seems like the issue with istio may be due to potentially various things. When configuring istio you are required to create an istio Gateway resource, the default one has pretty basic configurations. If your expectation is to be able to send SSL requests you will need to configure the gateway to be set up correctly. I would suggest to make sure that these pieces are in place in order for you to start running seldon core with istio correctly.

axsaucedo avatar Jul 27 '22 15:07 axsaucedo

Closig, please recreate if still an issue or look at v2

ukclivecox avatar Dec 19 '22 10:12 ukclivecox