seldon-core
seldon-core copied to clipboard
Change service key to allow container services to always match correctly
What this PR does / why we need it:
Services created for each node in the inference graph were using the same label key which means only 1 per pod would be active. This is ok for when the service orchestrator is in same pod as it would not use the services but use localhost directly.
The change is to create a label per node name so services are always correctly finding their pods.
- Adds unique service label
- Adds an example notebook with tests for various tranform-model-transform flows.
Which issue(s) this PR fixes:
Fixes #4036
Special notes for your reviewer:
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
/test integration
/test notebooks
/test integration
/test notebooks
/test integration
/test notebooks
It seems all integration tests passed (not sure why it's marked as failed as here it says pass) and only 1 notebook test failed (which is often flaky)
Edit: ok it seems the parallel tests are failing for the integration tests
/test integration
/test notebooks
Ok it seems like the operator upgrade tests are still failing and the tracing test is still failing, so it may be an issue - rerunning to confirm
/test integration
/test notebooks
/test integration
/test notebooks
/test notebooks
@cliveseldon I've been testing this locally, it seems like all works well as the svcorch model does work in 1.14.0-dev. I am finding some strange behaviour, but it's not clear it's form this PR nor whether it's only me - namely I am running some tests and I'm finding some strange behaviour, I'm currently testing in one of Clive's branches, but when i run the helm upgrade it doesn't trigger a model container bounce for the upgrade from 1.13.1 -> 1.14.0-dev for some strange reason (but it does for 1.12.0->1.13.1 as well as for 1.12.0->1.14.0-dev), is this behaviour consistent for you as well?
/test notebooks

/test integration
/test integration
Seems only flaky test is the rolling upgrade from 1.14.0, from discussion it's expected the rolling updates to potentially fail so we should be good to merge, re-running to validate flakiness on this specific test
/test integration
@cliveseldon: The following test failed, say /retest to rerun them all:
| Test name | Commit | Details | Rerun command |
|---|---|---|---|
| integration | 3c489380348757a6625e100c63cfc89469b7e0e6 | link | /test integration |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.
It seems it was indeed flaky, as now the failed one was the test_label_update[1.13.1] - looks good to merge
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: axsaucedo
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [axsaucedo]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment