tfx
tfx copied to clipboard
Cache skipped with same execution properties
System information
- Have I specified the code to reproduce the issue (Yes, No): No
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Kubeflow Pipelines GKE deployment - KF 1.5 version
- TensorFlow version: 2.5.1
- TFX Version: 1.2.1
- Python version: 3.7
- Python dependencies (from
pip freeze
output):
Describe the current behavior
When running an ExampleGen with identical execution properties with dataflowrunner, cached gets skipped.
Describe the expected behavior
Cache gets hit.
Standalone code to reproduce the issue
No code, but we do have execution -> https://gist.github.com/casassg/cdc5e7ef216ceac90f49adc0b7721c11
Name of your Organization (Optional) Twitter
Other info / logs
It is important to note that we do see difference on beam_pipeline_args but we are not clear if thats gets used for cache validation or not
Does the query string change?
Note that beam_pipeline_args differs as we built a container image and pushed it w 2 tags, however, I've been trying to go deep down the code in TFX that checks for executions but I have not been able to find a place for it to break cache.
Could you try rerunning with exactly the same beam_pipeline_args to see if it caches?
Query string stays the same, had to scratch it from gist for privacy reasons but I can assure you it's the same. Also running the same beam_pipeline_args it caches (I cloned the run on UI to validate this).
I'm 90% sure this is due to beam_pipeline_args changing but this also seems quite non-intuitive. Why does cache get invalidate for an class property like this? Ideally that should only modify how it gets executed, but if its the same inputs/exec_properties it should hit cache if enabled. That said, I have not been able to figure out the logic this is hitting
having same issue. context.run(example_gen, enable_cache=True)
with Jupiter notebook is not using cache
where following execution is using cache (executed as intelij python code).
metadata_connection_config = tfx.orchestration.metadata.sqlite_metadata_connection_config(METADATA_PATH.as_posix())
pipeline = tfx.dsl.Pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT.as_posix(),
components=components,
enable_cache=True,
metadata_connection_config=metadata_connection_config
)
result = tfx.orchestration.LocalDagRunner().run(pipeline)
is there any way to enforce StatisticsGen
to use given version of ExampleGen, use previous output.
@casassg, @ismailsimsek ,
The cache key is generated by applying SHA-256 hashing function on:
- Serialized pipeline info.
- Serialized node_info of the PipelineNode.
- Serialized executor spec
- Serialized input artifacts if any.
- Serialized output artifacts if any. The uri was removed during the process.
- Serialized parameters if any.
- Serialized module file content if module file is present in parameters.
Changing any of the above things results in invalidate the cache. Make sure the above things are constant and still after this if the cache gets skipped, Please let us know if the issue persists. Thank you!
Note that the issue is beam pipeline args being part of the cache (as those are execution configuration for Beam). Also, no longer in TFX so unfortunately can't test.
Adding on to @singhniraj08 above why beam pipeline args is considered part of cache:
-
beam_pipeline_args is part of BeamExecutorSpec on BeamComponents(e.g. ExampleGen under discussion in this PR is a subclass of BeamComponent) https://github.com/tensorflow/tfx/blob/master/tfx/components/example_gen/csv_example_gen/component.py
-
executor_spec is part of the cache context when launching components https://github.com/tensorflow/tfx/blob/master/tfx/orchestration/portable/launcher.py#L371
-
Therefore, beam_pipeline_args will be considered when choosing cache.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
This issue was closed due to lack of activity after being marked stale for past 7 days.