tfx
tfx copied to clipboard
Evaluator behavior when no thresholds configured
System information
- Have I specified the code to reproduce the issue (Yes, No): No
- Environment in which the code is executed: Kubeflow
- TensorFlow version: 2.5
- TFX Version: 1.2.1
- Python version: 3.7
Describe the current behavior
When running evaluator with an eval config that does not include any thresholds, the entire code path of producing a model blessing is skipped. The model is seemingly neither blessed nor not blessed. Nothing is written to the URI pointed to by the model blessing artifact.
Describe the expected behavior
In the case that there are no thresholds configured, I believe the model has vacuously satisfied the 0 thresholds and should be blessed. Even if we disagree on the default behavior:
- this should be documented in evaluator.md (there is no documentation anywhere in TF* that specifies the behavior of the blessing w/r/t different possibilities for thresholds)
- the model blessing artifact should be created with either true or false value
Standalone code to reproduce the issue
running evaluator with this eval_config leads to no blessing. adding in a threshold for the examplecount metric leads to the blessing being computed
tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name="serving_default",
label_key="label",
)
],
metrics_specs=[
tfma.MetricsSpec(
metrics=[
tfma.MetricConfig(
class_name="ExampleCount",
]
)
],
slicing_specs=[
tfma.SlicingSpec(),
],
Name of your Organization (Optional) Twitter
Other info / logs
The only indication that this is happening is the one line below: "no threshold configured, will not validate model"
INFO:absl:Evaluation complete. Results written to PATH/TO/ARTIFACT.
INFO:absl:No threshold configured, will not validate model.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 203969 succeeded.
INFO:absl:Cleaning up stateful execution info.
Thank you for the report! This seems like a intended implementation https://github.com/tensorflow/tfx/blob/49c24a6cd61054ea3e63ee7decb60099f5580739/tfx/components/evaluator/executor.py#L154-L158 But I agree with you that more documentation about the behavior would be better.
It is also unfortunate that tfx doesn't support "optional" output, so there is no way NOT to produce the blessing output for now.
@mdreves Hi, Mike. Do you have anything to add about the behavior?
to achieve this: "avoid accidentally blessing models when users forget to set thresholds" I think the most reasonable behavior would be to produce a "not blessed", rather than abstain from producing the artifact at all. they have identical effects but my proposal is more explicit I think.
When there is threshold configured, the binary is used as a pure evaluator. This is allowed so that multiple evaluators can be setup for one single model, if we configure it as a model validator, multiple model validators, (some with thresholds, some without thresholds), then we are going to need an extra layer of logic to reason about the final outcomes of the blessing, which is a lot more complex and can lead to unintended behavior.
There is a workaround, if you just want to always bless or unbless a model, you can setup a trivial threshold such as ExampleCount < -1 (trivially unbless) or Example > 0 (trivially bless).
When there is threshold configured, the binary is used as a pure evaluator. This is allowed so that multiple evaluators can be setup for one single model, if we configure it as a model validator, multiple model validators, (some with thresholds, some without thresholds), then we are going to need an extra layer of logic to reason about the final outcomes of the blessing, which is a lot more complex and can lead to unintended behavior.
Regardless of how it is conceived, the evaluator already does have the role as "pure evaluator" plus model validator because it produces the blessing. So my question still stands about whether a model that has been "purely evaluated" is blessed or not blessed. I don't feel strongly so making it not blessed is fine.
My concern still stands that this is undocumented behavior. One paragraph of explanation in https://github.com/tensorflow/tfx/blob/master/docs/guide/evaluator.md will probably satisfy this
Regardless of how it is conceived, the evaluator already does have the role as "pure evaluator" plus model validator because it produces the blessing. So my question still stands about whether a model that has been "purely evaluated" is blessed or not blessed.
There might be some information misleading, the evaluator have the role as evaluator, only optionally the role of model validator, not plus. Ideally, there shouldn't be any blessing artifact when used as a pure evaluator, that might be a bug or feature request for TFX.
Another angle to look at this is: whether to bless or unbless a model without threshold is ambiguous. Blessing by default might be desired, but can be dangerous since a user might forget to set any threshold and a bad model is pushed by accident. Unblessing is safer, but it leads to a no-op, since the model won't be pushed anyway. And again, this makes the multi-evaluators + model validators blessing reasoning unnecessarily complicated as mentioned in the previous reply.
I am curious how, in your case, the evaluators blessing result is used, or why it is causing issues.