seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Change service key to allow container services to always match correctly

Open ukclivecox opened this issue 3 years ago • 19 comments

What this PR does / why we need it:

Services created for each node in the inference graph were using the same label key which means only 1 per pod would be active. This is ok for when the service orchestrator is in same pod as it would not use the services but use localhost directly.

The change is to create a label per node name so services are always correctly finding their pods.

  • Adds unique service label
  • Adds an example notebook with tests for various tranform-model-transform flows.

Which issue(s) this PR fixes:

Fixes #4036

Special notes for your reviewer:

ukclivecox avatar Apr 08 '22 16:04 ukclivecox

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

/test integration

ukclivecox avatar Apr 08 '22 16:04 ukclivecox

/test notebooks

ukclivecox avatar Apr 08 '22 16:04 ukclivecox

/test integration

ukclivecox avatar Apr 08 '22 17:04 ukclivecox

/test notebooks

ukclivecox avatar Apr 11 '22 08:04 ukclivecox

/test integration

ukclivecox avatar Apr 11 '22 09:04 ukclivecox

/test notebooks

ukclivecox avatar Apr 11 '22 09:04 ukclivecox

It seems all integration tests passed (not sure why it's marked as failed as here it says pass) and only 1 notebook test failed (which is often flaky)

Edit: ok it seems the parallel tests are failing for the integration tests

axsaucedo avatar Apr 19 '22 07:04 axsaucedo

/test integration

axsaucedo avatar Apr 19 '22 07:04 axsaucedo

/test notebooks

axsaucedo avatar Apr 19 '22 07:04 axsaucedo

Ok it seems like the operator upgrade tests are still failing and the tracing test is still failing, so it may be an issue - rerunning to confirm

axsaucedo avatar Apr 19 '22 13:04 axsaucedo

/test integration

axsaucedo avatar Apr 19 '22 13:04 axsaucedo

/test notebooks

axsaucedo avatar Apr 19 '22 13:04 axsaucedo

/test integration

axsaucedo avatar Apr 21 '22 07:04 axsaucedo

/test notebooks

axsaucedo avatar Apr 21 '22 07:04 axsaucedo

/test notebooks

axsaucedo avatar Apr 21 '22 15:04 axsaucedo

@cliveseldon I've been testing this locally, it seems like all works well as the svcorch model does work in 1.14.0-dev. I am finding some strange behaviour, but it's not clear it's form this PR nor whether it's only me - namely I am running some tests and I'm finding some strange behaviour, I'm currently testing in one of Clive's branches, but when i run the helm upgrade it doesn't trigger a model container bounce for the upgrade from 1.13.1 -> 1.14.0-dev for some strange reason (but it does for 1.12.0->1.13.1 as well as for 1.12.0->1.14.0-dev), is this behaviour consistent for you as well?

axsaucedo avatar Apr 22 '22 07:04 axsaucedo

/test notebooks

ukclivecox avatar Aug 27 '22 12:08 ukclivecox

Screenshot_2022-08-27_15-37-02

ukclivecox avatar Aug 27 '22 14:08 ukclivecox

/test integration

ukclivecox avatar Aug 27 '22 14:08 ukclivecox

/test integration

axsaucedo avatar Sep 05 '22 10:09 axsaucedo

Seems only flaky test is the rolling upgrade from 1.14.0, from discussion it's expected the rolling updates to potentially fail so we should be good to merge, re-running to validate flakiness on this specific test

/test integration

axsaucedo avatar Sep 05 '22 15:09 axsaucedo

@cliveseldon: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration 3c489380348757a6625e100c63cfc89469b7e0e6 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

seldondev avatar Sep 05 '22 18:09 seldondev

It seems it was indeed flaky, as now the failed one was the test_label_update[1.13.1] - looks good to merge /approve

axsaucedo avatar Sep 06 '22 09:09 axsaucedo

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

seldondev avatar Sep 06 '22 09:09 seldondev