tfx icon indicating copy to clipboard operation
tfx copied to clipboard

TFX 1.9.0 Issues

Open rtg0795 opened this issue 3 years ago • 3 comments

Please comment or link any issues you find with TFX 1.9.0

Thanks.

rtg0795 avatar Jul 18 '22 19:07 rtg0795

Issue #5039

JPXKQX avatar Jul 27 '22 11:07 JPXKQX

Still running into problems with Dataflow jobs being stuck and killed after 1 hour. TFX 1.9.1 with Apache Beam 2.40.0, using the TFX docker image tensorflow/tfx:1.9.1

Error message from Dataflow:

Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.

Related to https://github.com/tensorflow/tfx/issues/4902#issuecomment-1154518518

Edit: Found the solution. It turns out it is related to this known issue. Setting the flag --experiments=disable_worker_container_image_prepull helped resolve the issue. TFX image has been getting larger and larger (TFX 1.9.1 around 17GB uncompressed), which means we would need to keep using this workaround in the future releases.

EdwardCuiPeacock avatar Aug 05 '22 00:08 EdwardCuiPeacock

Running data_view components on Vertex AI throws the same error as the first issue at 4472. Image version: tensorflow/tfx:1.9.1

KimuraTian avatar Sep 24 '22 06:09 KimuraTian