seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

fix(executor): payloads are logged twice in Combiner

Open caozhuozi opened this issue 2 years ago • 8 comments

What this PR does / why we need it: The PR aims to fix Issue #5116. The bug description is mirrored below for easy reference. If the Seldon-core >= 1.16 and there is a combiner in the graph with logger enabled, the payloads of the Combiner will be logged twice.

Which issue(s) this PR fixes:

Fixes #5116

Special notes for your reviewer: As I mentioned on the issue page, it is better to first clarify/define the concept of operation/situation. In other words, we need to first figure out:

What is a opration/situation during a payload lifecycle within a graph node?

With a clear agreement on what the concept refers to, we can then move to the log rule/convention proposed by @cliveseldon.

the key is the ability to log any before and after situation so as long as that can happen while at the same time removing redundant calls

Otherwise redundant logs are inevitable if we log payloads wherever we consider an operation/situation.

From my point of view, we can clarify the situation/operation by leveraging the logic of the Predict method as it naturally outlines the complete lifecycle of a payload within a node. See my comments below for details.

func (p *PredictorProcess) Predict(node *v1.PredictiveUnit, msg payload.SeldonPayload) (payload.SeldonPayload, error) {
        // ...

        // payloads before and after transformation input operation are logged inside
	tmsg, err := p.transformInput(node, msg, puid)
	if err != nil {
		return tmsg, err
	}
        // Payloads before calling Children nodes and after collection/aggregation of children nodes' results are also logged
        // We don't need to log payloads again within aggregate
	cmsg, err := p.predictChildren(node, tmsg, puid)
	if err != nil {
		return cmsg, err
	}
        // payloads before and after transformation output operation are logged inside
	response, err := p.transformOutput(node, cmsg, puid)

        // ...
	return response, err
}

To summarize, per the Predict method, a payload lifecycle within a graph node can be divided into 3 phases: transformInput, predictChildren, and transformOutput. Each stage can be regarded as a situation.

Note that if we restrict the definition of situation to the first function-call-level within Predict method, for the predictChildren method, no matter what type of node is ROUTER or COMBINER, we should just get the payloads logged before sending payload to children nodes and after collecting/aggregating results from children nodes respectively.

Thus, we don't need to do redundant payload logging in the inner aggregate or route function again.

caozhuozi avatar Sep 02 '23 09:09 caozhuozi

Hi @caozhuozi. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

seldondev avatar Sep 02 '23 09:09 seldondev

/ok-to-test

ukclivecox avatar Oct 09 '23 08:10 ukclivecox

/test-integration

ukclivecox avatar Oct 09 '23 08:10 ukclivecox

/test integration

ukclivecox avatar Oct 09 '23 08:10 ukclivecox

/test notebooks

ukclivecox avatar Oct 09 '23 08:10 ukclivecox

/test integration

ukclivecox avatar Oct 09 '23 09:10 ukclivecox

@caozhuozi: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration 160212b525ff0f9fddaae84d2a0f1c27c6021ae1 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

seldondev avatar Oct 09 '23 11:10 seldondev

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Apr 12 '24 14:04 CLAassistant