tfx
tfx copied to clipboard
TFX 1.9.0 Issues
Please comment or link any issues you find with TFX 1.9.0
Thanks.
Issue #5039
Still running into problems with Dataflow jobs being stuck and killed after 1 hour. TFX 1.9.1 with Apache Beam 2.40.0, using the TFX docker image tensorflow/tfx:1.9.1
Error message from Dataflow:
Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
Related to https://github.com/tensorflow/tfx/issues/4902#issuecomment-1154518518
Edit: Found the solution. It turns out it is related to this known issue. Setting the flag --experiments=disable_worker_container_image_prepull helped resolve the issue. TFX image has been getting larger and larger (TFX 1.9.1 around 17GB uncompressed), which means we would need to keep using this workaround in the future releases.
Running data_view components on Vertex AI throws the same error as the first issue at 4472. Image version: tensorflow/tfx:1.9.1