tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Evaluator behavior when no thresholds configured

Open micahjsmith opened this issue 2 years ago • 5 comments

System information

  • Have I specified the code to reproduce the issue (Yes, No): No
  • Environment in which the code is executed: Kubeflow
  • TensorFlow version: 2.5
  • TFX Version: 1.2.1
  • Python version: 3.7

Describe the current behavior

When running evaluator with an eval config that does not include any thresholds, the entire code path of producing a model blessing is skipped. The model is seemingly neither blessed nor not blessed. Nothing is written to the URI pointed to by the model blessing artifact.

Describe the expected behavior

In the case that there are no thresholds configured, I believe the model has vacuously satisfied the 0 thresholds and should be blessed. Even if we disagree on the default behavior:

  1. this should be documented in evaluator.md (there is no documentation anywhere in TF* that specifies the behavior of the blessing w/r/t different possibilities for thresholds)
  2. the model blessing artifact should be created with either true or false value

Standalone code to reproduce the issue

running evaluator with this eval_config leads to no blessing. adding in a threshold for the examplecount metric leads to the blessing being computed

tfma.EvalConfig(
      model_specs=[
        tfma.ModelSpec(
          signature_name="serving_default",
          label_key="label",
        )
      ],
      metrics_specs=[
        tfma.MetricsSpec(
          metrics=[
            tfma.MetricConfig(
              class_name="ExampleCount",
          ]
        )
      ],
      slicing_specs=[
        tfma.SlicingSpec(),
      ],

Name of your Organization (Optional) Twitter

Other info / logs

The only indication that this is happening is the one line below: "no threshold configured, will not validate model"

INFO:absl:Evaluation complete. Results written to PATH/TO/ARTIFACT.
INFO:absl:No threshold configured, will not validate model.
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 203969 succeeded.
INFO:absl:Cleaning up stateful execution info.

micahjsmith avatar Aug 16 '22 21:08 micahjsmith

Thank you for the report! This seems like a intended implementation https://github.com/tensorflow/tfx/blob/49c24a6cd61054ea3e63ee7decb60099f5580739/tfx/components/evaluator/executor.py#L154-L158 But I agree with you that more documentation about the behavior would be better.

It is also unfortunate that tfx doesn't support "optional" output, so there is no way NOT to produce the blessing output for now.

@mdreves Hi, Mike. Do you have anything to add about the behavior?

jiyongjung0 avatar Aug 17 '22 10:08 jiyongjung0

to achieve this: "avoid accidentally blessing models when users forget to set thresholds" I think the most reasonable behavior would be to produce a "not blessed", rather than abstain from producing the artifact at all. they have identical effects but my proposal is more explicit I think.

micahjsmith avatar Aug 17 '22 14:08 micahjsmith

When there is threshold configured, the binary is used as a pure evaluator. This is allowed so that multiple evaluators can be setup for one single model, if we configure it as a model validator, multiple model validators, (some with thresholds, some without thresholds), then we are going to need an extra layer of logic to reason about the final outcomes of the blessing, which is a lot more complex and can lead to unintended behavior.

There is a workaround, if you just want to always bless or unbless a model, you can setup a trivial threshold such as ExampleCount < -1 (trivially unbless) or Example > 0 (trivially bless).

genehwung avatar Aug 29 '22 21:08 genehwung

When there is threshold configured, the binary is used as a pure evaluator. This is allowed so that multiple evaluators can be setup for one single model, if we configure it as a model validator, multiple model validators, (some with thresholds, some without thresholds), then we are going to need an extra layer of logic to reason about the final outcomes of the blessing, which is a lot more complex and can lead to unintended behavior.

Regardless of how it is conceived, the evaluator already does have the role as "pure evaluator" plus model validator because it produces the blessing. So my question still stands about whether a model that has been "purely evaluated" is blessed or not blessed. I don't feel strongly so making it not blessed is fine.

My concern still stands that this is undocumented behavior. One paragraph of explanation in https://github.com/tensorflow/tfx/blob/master/docs/guide/evaluator.md will probably satisfy this

micahjsmith avatar Sep 09 '22 19:09 micahjsmith

Regardless of how it is conceived, the evaluator already does have the role as "pure evaluator" plus model validator because it produces the blessing. So my question still stands about whether a model that has been "purely evaluated" is blessed or not blessed.

There might be some information misleading, the evaluator have the role as evaluator, only optionally the role of model validator, not plus. Ideally, there shouldn't be any blessing artifact when used as a pure evaluator, that might be a bug or feature request for TFX.

Another angle to look at this is: whether to bless or unbless a model without threshold is ambiguous. Blessing by default might be desired, but can be dangerous since a user might forget to set any threshold and a bad model is pushed by accident. Unblessing is safer, but it leads to a no-op, since the model won't be pushed anyway. And again, this makes the multi-evaluators + model validators blessing reasoning unnecessarily complicated as mentioned in the previous reply.

I am curious how, in your case, the evaluators blessing result is used, or why it is causing issues.

genehwung avatar Sep 09 '22 22:09 genehwung