tfx
tfx copied to clipboard
Pipeline exception when pipeline_root points to s3
System information
- Have I specified the code to reproduce the issue (Yes, No): Yes
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Local Linux
- TensorFlow version: 2.6.2
- TFX Version: 1.3.3
- Python version: 3.8.12
- Python dependencies (from
pip freeze
output):
absl-py==0.12.0
apache-beam==2.33.0
argon2-cffi==21.1.0
astunparse==1.6.3
attrs==20.3.0
avro-python3==1.9.2.1
backcall==0.2.0
bleach==4.1.0
boto3==1.20.0
botocore==1.23.1
cachetools==4.2.4
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.7
clang==5.0
click==7.1.2
crcmod==1.7
debugpy==1.5.1
decorator==5.1.0
defusedxml==0.7.1
dill==0.3.1.1
docker==4.4.4
docopt==0.6.2
entrypoints==0.3
fastavro==1.4.7
fasteners==0.16.3
flatbuffers==1.12
future==0.18.2
gast==0.4.0
google-api-core==1.31.4
google-api-python-client==1.12.8
google-apitools==0.5.31
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-aiplatform==1.7.0
google-cloud-bigquery==2.30.1
google-cloud-bigtable==1.7.0
google-cloud-core==1.7.2
google-cloud-datastore==1.15.3
google-cloud-dlp==1.0.0
google-cloud-language==1.3.0
google-cloud-pubsub==1.7.0
google-cloud-recommendations-ai==0.2.0
google-cloud-spanner==1.19.1
google-cloud-storage==1.42.3
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-crc32c==1.3.0
google-pasta==0.2.0
google-resumable-media==2.1.0
googleapis-common-protos==1.53.0
grpc-google-iam-v1==0.12.3
grpcio==1.41.1
grpcio-gcp==0.2.2
h5py==3.1.0
hdfs==2.6.0
httplib2==0.19.1
idna==3.3
importlib-resources==5.4.0
ipykernel==6.5.0
ipython==7.29.0
ipython-genutils==0.2.0
ipywidgets==7.6.5
jedi==0.18.0
Jinja2==3.0.2
jmespath==0.10.0
joblib==0.14.1
jsonschema==4.2.1
jupyter-client==7.0.6
jupyter-core==4.9.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.2
keras==2.6.0
Keras-Preprocessing==1.1.2
keras-tuner==1.1.0
kt-legacy==1.0.4
kubernetes==12.0.1
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib-inline==0.1.3
mistune==0.8.4
ml-metadata==1.3.0
ml-pipelines-sdk==1.3.3
nbclient==0.5.5
nbconvert==6.2.0
nbformat==5.1.3
nest-asyncio==1.5.1
notebook==6.4.5
numpy==1.19.5
oauth2client==4.1.3
oauthlib==3.1.1
opt-einsum==3.3.0
orjson==3.6.4
packaging==20.9
pandas==1.3.4
pandocfilters==1.5.0
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
portpicker==1.5.0
prometheus-client==0.12.0
prompt-toolkit==3.0.22
proto-plus==1.19.7
protobuf==3.19.1
psutil==5.8.0
ptyprocess==0.7.0
pyarrow==2.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydot==1.4.2
Pygments==2.10.0
pymongo==3.12.1
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.3
PyYAML==5.4.1
pyzmq==22.3.0
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
s3transfer==0.5.0
scipy==1.7.2
Send2Trash==1.8.0
six==1.15.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.2
tensorflow-data-validation==1.3.0
tensorflow-estimator==2.6.0
tensorflow-hub==0.12.0
tensorflow-io==0.21.0
tensorflow-io-gcs-filesystem==0.21.0
tensorflow-metadata==1.2.0
tensorflow-model-analysis==0.34.1
tensorflow-serving-api==2.6.2
tensorflow-transform==1.3.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
tfx==1.3.3
tfx-bsl==1.3.0
tornado==6.1
traitlets==5.1.1
typing-extensions==3.7.4.3
uritemplate==3.0.1
urllib3==1.26.7
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
Werkzeug==2.0.2
widgetsnbextension==3.5.2
wrapt==1.12.1
zipp==3.6.0
Describe the current behavior
Pipeline run fails with error:
ValueError: Unexpected type <class 'list'>
when it tries to recover from the error and tries to log about it:
tensorflow.python.framework.errors_impl.NotFoundError: Object s3://bucket/prefix/CsvExampleGen/.system/executor_execution/1/.temp/ does not exist
Describe the expected behavior Pipeline artifacts should be generated in the s3://bucket/prefix/ directory
Standalone code to reproduce the issue ./in/test.csv
name1,name2
1,3
2,4
./main.py
from tfx import v1 as tfx
from tfx.components import CsvExampleGen
import tensorflow_io # init s3:// protocol support
def create_pipeline():
example_gen = CsvExampleGen(input_base="./in")
return tfx.dsl.Pipeline(
pipeline_name="test",
pipeline_root="s3://bucket/prefix/",
components=[example_gen],
metadata_connection_config=tfx.orchestration.metadata.sqlite_metadata_connection_config(
"./metadata.db"
),
)
pipeline = create_pipeline()
tfx.orchestration.LocalDagRunner().run(pipeline)
Other info / logs
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
Traceback (most recent call last):
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 97, in rmtree
tf.io.gfile.rmtree(path)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 677, in delete_recursively_v2
_pywrap_file_io.DeleteRecursively(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.NotFoundError: Object s3://bucket/prefix/CsvExampleGen/.system/executor_execution/1/.temp/ does not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/portable/launcher.py", line 466, in _clean_up_stateless_execution_info
fileio.rmtree(execution_info.tmp_dir)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/dsl/io/fileio.py", line 105, in rmtree
_get_filesystem(path).rmtree(path)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 99, in rmtree
raise filesystem.NotFoundError() from e
tfx.dsl.io.filesystem.NotFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 19, in <module>
tfx.orchestration.LocalDagRunner().run(pipeline)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/local/local_dag_runner.py", line 90, in run
component_launcher.launch()
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/portable/launcher.py", line 554, in launch
self._clean_up_stateless_execution_info(execution_info)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/portable/launcher.py", line 472, in _clean_up_stateless_execution_info
execution_info.tmp_dir, execution_info.to_proto())
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/portable/data_types.py", line 69, in to_proto
execution_properties=data_types_utils.build_metadata_value_dict(
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/data_types_utils.py", line 84, in build_metadata_value_dict
result[k] = set_metadata_value(value, v)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/data_types_utils.py", line 235, in set_metadata_value
set_parameter_value(parameter_value, value, set_schema=False)
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/data_types_utils.py", line 304, in set_parameter_value
parameter_value.field_value.string_value = get_value_and_set_type(
File "/home/lukesolo/tfx-s3/.venv/lib/python3.8/site-packages/tfx/orchestration/data_types_utils.py", line 287, in get_value_and_set_type
raise ValueError('Unexpected type %s' % type(value))
ValueError: Unexpected type <class 'list'>
@lukesolo This is because your tensorflow_io import only exists in your pipeline definition file, not in the ExampleGen component which is run in another process or container.
This is fixed in TFX 1.8.
@ConverJens this is still happening in TFX 1.9.0
I'm using KubeflowDagRunner