seldon-core
seldon-core copied to clipboard
fix(executor): payloads are logged twice in Combiner
What this PR does / why we need it: The PR aims to fix Issue #5116. The bug description is mirrored below for easy reference. If the Seldon-core >= 1.16 and there is a combiner in the graph with logger enabled, the payloads of the Combiner will be logged twice.
Which issue(s) this PR fixes:
Fixes #5116
Special notes for your reviewer: As I mentioned on the issue page, it is better to first clarify/define the concept of operation/situation. In other words, we need to first figure out:
What is a opration/situation during a payload lifecycle within a graph node?
With a clear agreement on what the concept refers to, we can then move to the log rule/convention proposed by @cliveseldon.
the key is the ability to log any before and after situation so as long as that can happen while at the same time removing redundant calls
Otherwise redundant logs are inevitable if we log payloads wherever we consider an operation/situation.
From my point of view, we can clarify the situation/operation by leveraging the logic of the Predict method as it naturally outlines the complete lifecycle of a payload within a node.
See my comments below for details.
func (p *PredictorProcess) Predict(node *v1.PredictiveUnit, msg payload.SeldonPayload) (payload.SeldonPayload, error) {
// ...
// payloads before and after transformation input operation are logged inside
tmsg, err := p.transformInput(node, msg, puid)
if err != nil {
return tmsg, err
}
// Payloads before calling Children nodes and after collection/aggregation of children nodes' results are also logged
// We don't need to log payloads again within aggregate
cmsg, err := p.predictChildren(node, tmsg, puid)
if err != nil {
return cmsg, err
}
// payloads before and after transformation output operation are logged inside
response, err := p.transformOutput(node, cmsg, puid)
// ...
return response, err
}
To summarize, per the Predict method, a payload lifecycle within a graph node can be divided into 3 phases: transformInput, predictChildren, and transformOutput. Each stage can be regarded as a situation.
Note that if we restrict the definition of situation to the first function-call-level within Predict method, for the predictChildren method, no matter what type of node is ROUTER or COMBINER, we should just get the payloads logged before sending payload to children nodes and after collecting/aggregating results from children nodes respectively.
Thus, we don't need to do redundant payload logging in the inner aggregate or route function again.
Hi @caozhuozi. Thanks for your PR.
I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.
/ok-to-test
/test-integration
/test integration
/test notebooks
/test integration
@caozhuozi: The following test failed, say /retest to rerun them all:
| Test name | Commit | Details | Rerun command |
|---|---|---|---|
| integration | 160212b525ff0f9fddaae84d2a0f1c27c6021ae1 | link | /test integration |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.