fairing
fairing copied to clipboard
Append Builder: Error seldon-core-microservice not found; Default image not correct
I tried building a deploying an endpoint
fairing.config.set_deployer('serving', serving_class="LabelPrediction")
create_endpoint = fairing.config.fn(LabelPrediction)
create_endpoint()
The deployed pod ended up crashing with the following error.
kubectl logs fairing-deployer-9mkf8-57c554f979-c572d
container_linux.go:247: starting container process caused "exec: \"seldon-core-microservice\": executable file not found in $PATH"
Container spec is
spec:
containers:
- command:
- seldon-core-microservice
- LabelPrediction
- REST
- --service-type=MODEL
- --persistence=0
env:
- name: FAIRING_RUNTIME
value: "1"
image: gcr.io/code-search-demo/fairing-job:BEF9445D
imagePullPolicy: IfNotPresent
name: model
resources: {}
securityContext:
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rqqln
readOnly: true
dnsPolicy: ClusterFirst
nodeName: gke-label-issues-040-label-issues-040-cf06394a-t02q
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-rqqln
secret:
defaultMode: 420
secretName: default-token-rqqln
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-04-10T03:43:37Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-04-10T03:43:37Z
message: 'containers with unready status: [model]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [model]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-04-10T03:43:37Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://dbe925a3d4fe386348ee5b6b4a2e533e3cf449af51af9335afe1298b5e9c656f
image: gcr.io/code-search-demo/fairing-job:BEF9445D
imageID: docker-pullable://gcr.io/code-search-demo/fairing-job@sha256:ddea06ddd16c92f80d9e14bf5e1c126635eedefdbe23c52067724c7430807262
lastState:
terminated:
containerID: docker://dbe925a3d4fe386348ee5b6b4a2e533e3cf449af51af9335afe1298b5e9c656f
exitCode: 127
finishedAt: 2019-04-10T03:51:23Z
message: |
oci runtime error: container_linux.go:247: starting container process caused "exec: \"seldon-core-microservice\": executable file not found in $PATH"
reason: ContainerCannotRun
startedAt: 2019-04-10T03:51:23Z
name: model
ready: false
restartCount: 6
state:
waiting:
message: Back-off 5m0s restarting failed container=model pod=fairing-deployer-9mkf8-57c554f979-c572d_kubeflow(d2ee14f0-5b42-11e9-aa05-42010a8e0051)
reason: CrashLoopBackOff
hostIP: 10.142.0.10
phase: Running
podIP: 10.0.0.29
qosClass: BestEffort
startTime: 2019-04-10T03:43:37Z
It looks like the default image is
from fairing import constants
constants.constants.DEFAULT_BASE_IMAGE
gcr.io/kubeflow-images-public/fairing:dev
That image looks pretty old. Jan 23 2019.
https://github.com/kubeflow/fairing/blob/736c025e6d77f135bda345e5030398d5d2ef654a/examples/prediction/README.md
Looks like maybe we should be using: seldonio/seldon-core-s2i-python3:0.4
Looks like that fixed that problem. Now I get a different error
2019-04-10 04:12:01,837 - seldon_core.microservice:main:261 - INFO: Starting microservice.py:main
2019-04-10 04:12:01,839 - seldon_core.microservice:main:292 - INFO: Annotations: {}
Traceback (most recent call last):
File "/usr/local/bin/seldon-core-microservice", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/seldon_core/microservice.py", line 294, in main
interface_file = importlib.import_module(args.interface_name)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'LabelPrediction'
So it looks like its a bug with the default image not being valid.
I think this might be fixed by https://github.com/kubeflow/fairing/pull/207
@karthikv2k Any update on the status of this work?
I think one way to test this would be to use the demo notebook I used for kubecon https://github.com/jlewi/kubecon-demo/blob/master/ames-xgboost-build-train-deploy.ipynb
I was using a fork of fairing. It would be great if that fork wasn't needed and we could run against master. Its possible that is already the case because all the requisite fixes like #207 have been merged into master.
no updates. I should be able to look into it next week.
@karthikv2k Any update on this?
It looks like this still isn't fixed. Code is still using gcr.io/kubeflow-images-public/fairing:dev https://github.com/kubeflow/fairing/blob/87c1185cde356939494ff4e9631c0b490b27153a/fairing/constants/constants.py
And that image is still very old January 22, 2019
It looks like the code in the original bug report is using the higher level API; using the append builder directly works just fine See for example https://github.com/kubeflow/examples/tree/master/xgboost_synthetic
I've been trying to see if I can get something cool going with fairing for the past couple of days. Unfortunately I just ran into this one. In my case, it's based on the XGBoost sample notebook code - the high level APIs one you spoke about (https://github.com/kubeflow/fairing/blob/master/examples/prediction/xgboost-high-level-apis.ipynb):
endpoint = PredictionEndpoint(HousingServe, input_files=['model.dat'],
service_type='LoadBalancer',
docker_registry=DOCKER_REGISTRY,
backend=BackendClass(build_context_source=BuildContext))
endpoint.create()
I understand Kubeflow and fairing are super early alpha, but I'm really keen to get something together and working. The fairing docs/samples look to use the high level API, am I understanding correctly that that's just not working at the moment (and that low level APIs aren't documented)?
Is there anything I can do to inform myself of how to get something up and running? Should I try overriding the base image somehow in PredictionEndpoint - was there a known-good one or another image I can build myself somehow?
I've hit this as well, deploying my own model:
endpoint = PredictionEndpoint(MyModelServe, input_files=included_files,
service_type='LoadBalancer',
docker_registry='{}.dkr.ecr.{}.amazonaws.com'.format(AWS_ACCOUNT_ID, AWS_REGION),
backend=BackendClass(build_context_source=build_context))
endpoint.create()