tfx icon indicating copy to clipboard operation
tfx copied to clipboard

local dag runner TFX pipeline run create ERROR Failed to make stateful working dir ; Protocol error

Open miroC911 opened this issue 3 years ago • 6 comments

OS : Linux Ubuntu 21.04. TensorFlow version: 2.6.2 TFX version: 1.4.0 Python 3.8.0

Hello, I am following Building a TFX Pipeline Locally (https://www.tensorflow.org/tfx/guide/build_local_pipeline). I am only running CsvExampleGen component and I am getting the following error:

ERROR:absl:Failed to make stateful working dir: ./my_pipeline_output/CsvExampleGen/.system/stateful_working_dir/2022-01-05T11:04:16.463569 Traceback (most recent call last):........ File "/home/mc/anaconda3/envs/tfx_linux/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 514, in recursive_create_dir_v2 _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path)) tensorflow.python.framework.errors_impl.UnknownError: ./my_pipeline_output/CsvExampleGen/.system/stateful_working_dir/2022-01-05T11:04:16.463569; Protocol error

I looked into tensorflow/python/lib/io/file_io.py -> function recursive_create_dir_v2() but that is it! :-). I would appreciate any suggestion. I am certainly missing something ... Thanks

miroC911 avatar Jan 05 '22 12:01 miroC911

@miroC911,

Can you share the complete error trace and the line which triggered this error in the example program? Thanks!

sanatmpa1 avatar Jan 10 '22 04:01 sanatmpa1

@sanatmpa1

Hello, please see below. If you need more info please let me know. with thanks

Error trace: NFO:absl:MetadataStore with DB connection initialized INFO:absl:select span and version = (0, None) INFO:absl:latest span and version = (0, None) INFO:absl:MetadataStore with DB connection initialized INFO:absl:Going to run a new execution 2 ERROR:absl:Failed to make stateful working dir: ./my_pipeline_output/CsvExampleGen/.system/stateful_working_dir/2022-01-11T13:41:58.037527 Traceback (most recent call last): File "/home/mc/anaconda3/envs/tfx_linux/lib/python3.8/site-packages/tfx/orchestration/portable/outputs_utils.py", line 220, in get_stateful_working_directory fileio.makedirs(stateful_working_dir) File "/home/mc/anaconda3/envs/tfx_linux/lib/python3.8/site-packages/tfx/dsl/io/fileio.py", line 78, in makedirs _get_filesystem(path).makedirs(path) File "/home/mc/anaconda3/envs/tfx_linux/lib/python3.8/site-packages/tfx/dsl/io/plugins/tensorflow_gfile.py", line 71, in makedirs tf.io.gfile.makedirs(path) File "/home/mc/anaconda3/envs/tfx_linux/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py", line 514, in recursive_create_dir_v2 _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path)) tensorflow.python.framework.errors_impl.UnknownError: ./my_pipeline_output/CsvExampleGen/.system/stateful_working_dir/2022-01-11T13:41:58.037527; Protocol error

triggers: File "/home/mc/git/TFX_Tutorials/TFX_tutorial/example_TFX_pipeline/my_pipeline.py", line 53, in run_pipeline tfx.orchestration.LocalDagRunner().run(my_pipeline) File "/home/mc/git/TFX_Tutorials/TFX_tutorial/example_TFX_pipeline/my_pipeline.py", line 57, in run_pipeline()

miroC911 avatar Jan 11 '22 12:01 miroC911

@miroC911,

Can you share a simple standalone code or colab gist to reproduce the issue? Thanks!

sanatmpa1 avatar Jan 11 '22 16:01 sanatmpa1

@sanatmpa1 Hello, with thanks. https://gist.github.com/miroC911/da835e2e1c5c22b7cb1c54e530962589

miroC911 avatar Jan 12 '22 12:01 miroC911

I'm running W10 and the problem is the method LocalDagRunner.run_with_ir (tfx.orchestration.local.local_dag_runner.py) when substituting the runtime parameter to be a concrete run_id, it's replacing pipeline_run_id with datetime.datetime.now().isoformat() which returns colon for separator between HH MM and SS. This is not allowed in Windows file names

juan-sv avatar Mar 21 '22 13:03 juan-sv

I'm running W10 and the problem is the method LocalDagRunner.run_with_ir (tfx.orchestration.local.local_dag_runner.py) when substituting the runtime parameter to be a concrete run_id, it's replacing pipeline_run_id with datetime.datetime.now().isoformat() which returns colon for separator between HH MM and SS. This is not allowed in Windows file names

@see tfx/issues/4474 In tfx\orchestration\portable\outputs_utils.py, self.pipeline_run_id.replace(':', '') fixes the issue in the get_stateful_working_directory function

aurelienmorgan avatar Jun 20 '22 13:06 aurelienmorgan

@miroC911 As mentioned above, this issue is specific to windows path and a workaround is mentioned here. Lets close this issue and track it here. Thanks!!

gowthamkpr avatar Oct 05 '22 18:10 gowthamkpr

Agree to close the issue. Please send any follow up request to @ruoyu90

zhitaoli avatar Oct 11 '22 16:10 zhitaoli

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Oct 11 '22 16:10 google-ml-butler[bot]