tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Tranform fails when using force_tf_compat_v1=True (while running 'Analyze/ComputeDeferredMetadata[compat_v1=True]')

Open macoun opened this issue 4 years ago • 7 comments

Hi,

version : - TFX Version: 0.26.1 (same with 0.26.0) - Python version 3.7:

Describe the current behavior

When running the Transform component, even with no transformations at all, it will fail when the argument force_tf_compat_v1 is set to True.

Describe the expected behavior

Since this is happening even without any real transformation, I'd expect that the transformation (including analyze) should run through without any exception.

Standalone code to reproduce the issue

Here is a bare minimum test case that will throw an exception when force_tf_compat_v1 is set to True in the Transform component.

import os
import numpy as np
import pandas as pd
from tfx.components import CsvExampleGen
from tfx.components import StatisticsGen
from tfx.components import SchemaGen
from tfx.components import Transform
from tfx.orchestration import pipeline
from tfx.orchestration import metadata
from tfx.orchestration.beam import beam_dag_runner
from tfx.proto import example_gen_pb2


def gen_data(data_folder, num_vals):
    """ Simple data generator with 2 features (x0, x1) and one label (y) """
    os.makedirs(data_folder, exist_ok=True)

    df = pd.DataFrame({
        'x0': np.random.normal(4, 3, num_vals),
        'x1': np.random.normal(-3, 4, num_vals)})

    df['y'] = 3.14*df['x0'] + 2.71*df['x1']

    df[:int(num_vals*0.8)].to_csv(f'{data_folder}/train.csv', index=False)
    df[int(num_vals*0.8):].to_csv(f'{data_folder}/eval.csv', index=False)

def preprocessing_fn(inputs):
    """ Dummy function. Doesn't do any transformation 
        but the Transform component still fails if force_tf_compat_v1=True """
    return inputs

def create_pipeline(data_folder, pipeline_name, pipeline_root):
    """ Creates a simple pipeline to illustrate the problem in the Transform component """

    # ExampleGen
    input_config = example_gen_pb2.Input(splits=[
        example_gen_pb2.Input.Split(name='train', pattern='train.csv'),
        example_gen_pb2.Input.Split(name='eval', pattern='eval.csv'),
    ])
    example_gen = CsvExampleGen(input_base=data_folder, input_config=input_config)

    # Statistics
    statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

    # Schema
    schema_gen = SchemaGen(
        statistics=statistics_gen.outputs['statistics'],
        infer_feature_shape=True)

    # Transform
    transform = Transform(
        examples=example_gen.outputs['examples'],
        schema=schema_gen.outputs['schema'],
        preprocessing_fn='__main__.preprocessing_fn',
        # Setting force_tf_compat_v1=False works.
        # Setting it to True will fail.
        force_tf_compat_v1=True,
    )

    # Metadata
    metadata_db = f'{pipeline_root}/metadata.db'
    metadata_connection_config = metadata.sqlite_metadata_connection_config(metadata_db)

    return pipeline.Pipeline(
        pipeline_name=pipeline_name,
        pipeline_root=pipeline_root,
        components=[
            example_gen,
            statistics_gen,
            schema_gen,
            transform,
        ],
        enable_cache=False,
        metadata_connection_config=metadata_connection_config)


if __name__ == '__main__':
    gen_data('data', 1000)
    beam_dag_runner.BeamDagRunner().run(
        create_pipeline('data', 'test', './test-pipeline-root'))

Other info / logs

Traceback (most recent call last): apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/transforms/core.py", line 1590, in wrapper = lambda x, *args, **kwargs: [fn(x, *args, **kwargs)] File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 782, in _infer_metadata_from_saved_model return _infer_metadata_from_saved_model_v1(saved_model_dir) File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py", line 796, in _infer_metadata_from_saved_model_v1 session.run(tf.compat.v1.tables_initializer()) File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run run_metadata_ptr) File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run feed_dict_tensor, options, run_metadata) File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run run_metadata) File "/Users/macon/Projects/Pipelines/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) RuntimeError: tensorflow.python.framework.errors_impl.NotFoundError: No attr named 'NoOp' in NodeDef: [[node transform/inputs/x0/x0 (defined at /pipeline-poc/venv/lib/python3.7/site-packages/tensorflow_transform/saved/saved_transform_io.py:271) ]]

Original stack trace for 'transform/inputs/x0/x0': File "/my-ds-project/transformer_test.py", line 81, in create_pipeline('data', 'test', './test-pipeline-root')) File "/pipeline-poc/venv/lib/python3.7/site-packages/tfx/orchestration/beam/beam_dag_runner.py", line 327, in run logging.info('Node %s is scheduled.', node_id) File "/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 582, in exit self.result = self.run() File "/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/pipeline.py", line 561, in run return self.runner.run_pipeline(self, self._options) File "/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py", line 126, in run_pipeline return runner.run_pipeline(pipeline, options) File "/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 183, in run_pipeline pipeline.to_runner_api(default_environment=self._default_environment)) File "/pipeline-poc/venv/lib/python3.7/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 193, in run_via_runner_api ... File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements return_elements=return_elements) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def producer_op_list=producer_op_list) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal _ProcessNewOps(graph) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3624, in _add_new_tf_operations for c_op in c_api_util.new_tf_operations(self) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3624, in for c_op in c_api_util.new_tf_operations(self) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3510, in _create_op_from_tf_operation ret = Operation(c_op, self) File "/pipeline-poc/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1949, in init self._traceback = tf_stack.extract_stack() [while running 'Analyze/ComputeDeferredMetadata[compat_v1=True]']

macoun avatar Feb 06 '21 13:02 macoun

@macoun , i tried reproducing with(but couldn't) :- TFX :- 0.26.1 python :- 3.7.9 tensorflow transform :- 0.26.0 tensorflow :- 2.3.2

Running pipeline:
 %s pipeline_info {
  id: "test"
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.example_gen.csv_example_gen.component.CsvExampleGen"
      }
      id: "CsvExampleGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "test"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "20210208-112506.803665"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "test.CsvExampleGen"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "examples"
        value {
          artifact_spec {
            type {
              name: "Examples"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
              properties {
                key: "version"
                value: INT
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "input_base"
        value {
          field_value {
            string_value: "data"
          }
        }
      }
      parameters {
        key: "input_config"
        value {
          field_value {
            string_value: "{\n  \"splits\": [\n    {\n      \"name\": \"train\",\n      \"pattern\": \"train.csv\"\n    },\n    {\n      \"name\": \"eval\",\n      \"pattern\": \"eval.csv\"\n    }\n  ]\n}"
          }
        }
      }
      parameters {
        key: "output_config"
        value {
          field_value {
            string_value: "{}"
          }
        }
      }
      parameters {
        key: "output_data_format"
        value {
          field_value {
            int_value: 6
          }
        }
      }
    }
    downstream_nodes: "StatisticsGen"
    downstream_nodes: "Transform"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.statistics_gen.component.StatisticsGen"
      }
      id: "StatisticsGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "test"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "20210208-112506.803665"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "test.StatisticsGen"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "examples"
        value {
          channels {
            producer_node_query {
              id: "CsvExampleGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "test"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "20210208-112506.803665"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "test.CsvExampleGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Examples"
              }
            }
            output_key: "examples"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "statistics"
        value {
          artifact_spec {
            type {
              name: "ExampleStatistics"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
    }
    upstream_nodes: "CsvExampleGen"
    downstream_nodes: "SchemaGen"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.schema_gen.component.SchemaGen"
      }
      id: "SchemaGen"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "test"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "20210208-112506.803665"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "test.SchemaGen"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "statistics"
        value {
          channels {
            producer_node_query {
              id: "StatisticsGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "test"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "20210208-112506.803665"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "test.StatisticsGen"
                }
              }
            }
            artifact_query {
              type {
                name: "ExampleStatistics"
              }
            }
            output_key: "statistics"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "schema"
        value {
          artifact_spec {
            type {
              name: "Schema"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "exclude_splits"
        value {
          field_value {
            string_value: "[]"
          }
        }
      }
      parameters {
        key: "infer_feature_shape"
        value {
          field_value {
            int_value: 1
          }
        }
      }
    }
    upstream_nodes: "StatisticsGen"
    downstream_nodes: "Transform"
    execution_options {
      caching_options {
      }
    }
  }
}
nodes {
  pipeline_node {
    node_info {
      type {
        name: "tfx.components.transform.component.Transform"
      }
      id: "Transform"
    }
    contexts {
      contexts {
        type {
          name: "pipeline"
        }
        name {
          field_value {
            string_value: "test"
          }
        }
      }
      contexts {
        type {
          name: "pipeline_run"
        }
        name {
          field_value {
            string_value: "20210208-112506.803665"
          }
        }
      }
      contexts {
        type {
          name: "node"
        }
        name {
          field_value {
            string_value: "test.Transform"
          }
        }
      }
    }
    inputs {
      inputs {
        key: "examples"
        value {
          channels {
            producer_node_query {
              id: "CsvExampleGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "test"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "20210208-112506.803665"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "test.CsvExampleGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Examples"
              }
            }
            output_key: "examples"
          }
        }
      }
      inputs {
        key: "schema"
        value {
          channels {
            producer_node_query {
              id: "SchemaGen"
            }
            context_queries {
              type {
                name: "pipeline"
              }
              name {
                field_value {
                  string_value: "test"
                }
              }
            }
            context_queries {
              type {
                name: "pipeline_run"
              }
              name {
                field_value {
                  string_value: "20210208-112506.803665"
                }
              }
            }
            context_queries {
              type {
                name: "node"
              }
              name {
                field_value {
                  string_value: "test.SchemaGen"
                }
              }
            }
            artifact_query {
              type {
                name: "Schema"
              }
            }
            output_key: "schema"
          }
        }
      }
    }
    outputs {
      outputs {
        key: "transform_graph"
        value {
          artifact_spec {
            type {
              name: "TransformGraph"
            }
          }
        }
      }
      outputs {
        key: "transformed_examples"
        value {
          artifact_spec {
            type {
              name: "Examples"
              properties {
                key: "span"
                value: INT
              }
              properties {
                key: "split_names"
                value: STRING
              }
              properties {
                key: "version"
                value: INT
              }
            }
          }
        }
      }
      outputs {
        key: "updated_analyzer_cache"
        value {
          artifact_spec {
            type {
              name: "TransformCache"
            }
          }
        }
      }
    }
    parameters {
      parameters {
        key: "custom_config"
        value {
          field_value {
            string_value: "null"
          }
        }
      }
      parameters {
        key: "force_tf_compat_v1"
        value {
          field_value {
            int_value: 1
          }
        }
      }
      parameters {
        key: "preprocessing_fn"
        value {
          field_value {
            string_value: "__main__.preprocessing_fn"
          }
        }
      }
    }
    upstream_nodes: "CsvExampleGen"
    upstream_nodes: "SchemaGen"
    execution_options {
      caching_options {
      }
    }
  }
}
runtime_spec {
  pipeline_root {
    field_value {
      string_value: "./test-pipeline-root"
    }
  }
  pipeline_run_id {
    field_value {
      string_value: "20210208-112506.803665"
    }
  }
}
execution_mode: SYNC
deployment_config {
  type_url: "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig"
  value: "\n\214\001\n\tTransform\022\177\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022,\n*tfx.components.transform.executor.Executor\n\226\001\n\rStatisticsGen\022\204\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\0221\n/tfx.components.statistics_gen.executor.Executor\n\216\001\n\tSchemaGen\022\200\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022-\n+tfx.components.schema_gen.executor.Executor\n\243\001\n\rCsvExampleGen\022\221\001\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022>\n<tfx.components.example_gen.csv_example_gen.executor.Executor\022\216\001\n\rCsvExampleGen\022}\nOtype.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec\022*\n(tfx.components.example_gen.driver.Driver*Z\n0type.googleapis.com/ml_metadata.ConnectionConfig\022&\032$\n ./test-pipeline-root/metadata.db\020\003"
}

WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/home/AG00638479/.local/share/jupyter/runtime/kernel-d2e5d356-7225-44f6-ae9d-6be3600c048b.json']
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/home/AG00638479/.local/share/jupyter/runtime/kernel-d2e5d356-7225-44f6-ae9d-6be3600c048b.json']
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/home/AG00638479/.local/share/jupyter/runtime/kernel-d2e5d356-7225-44f6-ae9d-6be3600c048b.json']
WARNING:tensorflow:From /home/AG00638479/.conda/envs/tfx_0.26.0/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:apache_beam.options.pipeline_options:Discarding unparseable args: ['-f', '/home/AG00638479/.local/share/jupyter/runtime/kernel-d2e5d356-7225-44f6-ae9d-6be3600c048b.json']
WARNING:absl:The default value of `force_tf_compat_v1` will change in a future release from `True` to `False`. Since this pipeline has TF 2 behaviors enabled, Transform will use native TF 2 at that point. You can test this behavior now by passing `force_tf_compat_v1=False` or disable it by explicitly setting `force_tf_compat_v1=True` in the Transform component.
WARNING:tensorflow:From /home/AG00638479/.conda/envs/tfx_0.26.0/lib/python3.7/site-packages/tfx/components/transform/executor.py:541: Schema (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`
WARNING:tensorflow:From /home/AG00638479/.conda/envs/tfx_0.26.0/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:261: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:TFT beam APIs accept both the TFXIO format and the instance dict format now. There is no need to set use_tfxio any more and it will be removed soon.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:tensorflow:Tensorflow version (2.3.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:From /home/AG00638479/.conda/envs/tfx_0.26.0/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ./test-pipeline-root/Transform/transform_graph/8/.temp_path/tftransform_tmp/89012cf61f4a4b2eae870e285b80e5d9/saved_model.pb
WARNING:tensorflow:Tensorflow version (2.3.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:tensorflow:Tensorflow version (2.3.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
[ ]:

arghyaganguly avatar Feb 08 '21 11:02 arghyaganguly

Thank you @arghyaganguly for your answer.

I've tried a refresh install and got now first stuck in StatisticsGen (because of the new numpy release I guess. But please read further).

The platform I'm testing with is:

Platform: macOS (10.15.7) Python: (Python 3.7.4 (v3.7.4:e09359112e, Jul 8 2019, 14:54:52) [Clang 6.0 (clang-600.0.57)] on darwin)

I can reproduce it with the following instructions:

python3.7 -m venv venv
source venv/bin/activate
pip install tfx==0.26.1

The pip install outputs 3 warnings:

...
apache-beam 2.27.0 has requirement httplib2<0.18.0,>=0.8, but you'll have httplib2 0.19.0 which is incompatible.
tensorboard 2.4.1 has requirement setuptools>=41.0.0, but you'll have setuptools 40.8.0 which is incompatible.
tensorflow 2.3.2 has requirement numpy<1.19.0,>=1.16.0, but you'll have numpy 1.20.1 which is incompatible.
...

A pip freeze shows the following packages:

absl-py==0.10.0
apache-beam==2.27.0
appnope==0.1.2
argon2-cffi==20.1.0
astunparse==1.6.3
async-generator==1.10
attrs==20.3.0
avro-python3==1.9.2.1
backcall==0.2.0
bleach==3.3.0
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.4
chardet==4.0.0
click==7.1.2
colorama==0.4.4
crcmod==1.7
decorator==4.4.2
defusedxml==0.6.0
dill==0.3.1.1
docker==4.4.1
docopt==0.6.2
entrypoints==0.3
fastavro==1.3.1
fasteners==0.16
future==0.18.2
gast==0.3.3
google-api-core==1.25.1
google-api-python-client==1.12.8
google-apitools==0.5.31
google-auth==1.25.0
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.2
google-cloud-bigquery==1.28.0
google-cloud-bigtable==1.6.1
google-cloud-build==2.0.0
google-cloud-core==1.6.0
google-cloud-datastore==1.15.3
google-cloud-dlp==1.0.0
google-cloud-language==1.3.0
google-cloud-pubsub==1.7.0
google-cloud-spanner==1.19.1
google-cloud-storage==1.35.1
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-crc32c==1.1.2
google-pasta==0.2.0
google-resumable-media==1.2.0
googleapis-common-protos==1.52.0
grpc-google-iam-v1==0.12.3
grpcio==1.35.0
grpcio-gcp==0.2.2
h5py==2.10.0
hdfs==2.5.8
httplib2==0.19.0
idna==2.10
importlib-metadata==3.4.0
ipykernel==5.4.3
ipython==7.20.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==2.11.3
joblib==0.14.1
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
Keras-Preprocessing==1.1.2
keras-tuner==1.0.1
kubernetes==11.0.0
libcst==0.3.16
Markdown==3.3.3
MarkupSafe==1.1.1
mistune==0.8.4
ml-metadata==0.26.0
ml-pipelines-sdk==0.26.1
mock==2.0.0
mypy-extensions==0.4.3
nbclient==0.5.1
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
notebook==6.2.0
numpy==1.20.1
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.2.1
pandocfilters==1.4.3
parso==0.8.1
pbr==5.5.1
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.9.0
promise==2.3
prompt-toolkit==3.0.14
proto-plus==1.13.0
protobuf==3.14.0
ptyprocess==0.7.0
pyarrow==0.17.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pydot==1.4.1
Pygments==2.7.4
pymongo==3.11.3
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
pyzmq==22.0.2
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7
scikit-learn==0.24.1
scipy==1.6.0
Send2Trash==1.5.0
six==1.15.0
tabulate==0.8.7
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.3.2
tensorflow-cloud==0.1.12
tensorflow-data-validation==0.26.0
tensorflow-datasets==3.0.0
tensorflow-estimator==2.3.0
tensorflow-hub==0.9.0
tensorflow-metadata==0.26.0
tensorflow-model-analysis==0.26.0
tensorflow-serving-api==2.3.0
tensorflow-transform==0.26.0
termcolor==1.1.0
terminado==0.9.2
terminaltables==3.1.0
testpath==0.4.4
tfx==0.26.1
tfx-bsl==0.26.1
threadpoolctl==2.1.0
tornado==6.1
tqdm==4.56.0
traitlets==5.0.5
typing-extensions==3.7.4.3
typing-inspect==0.6.0
uritemplate==3.0.1
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==0.57.0
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.4.0

Running the test code like

python main.py # main.py contains the above test code

Produces the following error now:

...
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow_data_validation/statistics/stats_impl.py", line 667, in <lambda>
    lambda gen, gen_acc: gen.add_input(gen_acc, record_batch),
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 1023, in add_input
    weights)
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 232, in update
    weights)
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow_data_validation/statistics/generators/basic_stats_generator.py", line 128, in update
    num_values_grouped = pa.array(num_values_not_none).value_counts()
  File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow/types.pxi", line 76, in pyarrow.lib._datatype_to_pep3118
  File "pyarrow/array.pxi", line 64, in pyarrow.lib._ndarray_to_type
  File "pyarrow/error.pxi", line 108, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Did not pass numpy.dtype object [while running 'Run[StatisticsGen]']

There is no other custom code or configuration involved. Since the pip warning indicated a numpy conflict, I've re-installed the correct numpy version.

pip install --upgrade 'numpy<1.19.0,>=1.16.0'

And now I'm back to the actual problem when I run python main.py again:

...
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3624, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3510, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/Users/macoun/Projects/Pipelines/transform-test/venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
    self._traceback = tf_stack.extract_stack() [while running 'Analyze/ComputeDeferredMetadata[compat_v1=True]']

macoun avatar Feb 08 '21 12:02 macoun

@macoun , numpy version for my attempt :- 1.18.5 i am using a conda based environment. OS :- Ubuntu 18.04.5 pip freeze output :-

Package                            Version
---------------------------------- -------------------
absl-py                            0.10.0
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-project                   0.9.1
anyio                              2.1.0
apache-beam                        2.27.0
appdirs                            1.4.4
argh                               0.26.2
argon2-cffi                        20.1.0
asn1crypto                         1.4.0
astroid                            2.4.2
astropy                            4.2
astunparse                         1.6.3
async-generator                    1.10
atomicwrites                       1.4.0
attrs                              20.3.0
autopep8                           1.5.5
avro-python3                       1.9.2.1
Babel                              2.9.0
backcall                           0.2.0
backports.functools-lru-cache      1.6.1
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4                     4.9.3
bitarray                           1.6.3
bkcharts                           0.2
black                              20.8b1
bleach                             3.3.0
bokeh                              2.2.3
boto                               2.49.0
Bottleneck                         1.3.2
brotlipy                           0.7.0
cached-property                    1.5.1
cachetools                         4.2.1
certifi                            2020.6.20
cffi                               1.14.4
chardet                            4.0.0
click                              7.1.2
cloudpickle                        1.6.0
clyent                             1.2.2
colorama                           0.4.4
contextlib2                        0.6.0.post1
crcmod                             1.7
cryptography                       3.3.1
cycler                             0.10.0
Cython                             0.29.21
cytoolz                            0.11.0
dask                               2021.2.0
decorator                          4.4.2
defusedxml                         0.6.0
diff-match-patch                   20200713
dill                               0.3.1.1
distributed                        2021.2.0
docker                             4.4.1
docopt                             0.6.2
docutils                           0.16
entrypoints                        0.3
et-xmlfile                         1.0.1
facets-overview                    1.0.0
fastavro                           1.3.1
fastcache                          1.1.0
fasteners                          0.16
filelock                           3.0.12
flake8                             3.8.4
Flask                              1.1.2
fsspec                             0.8.5
future                             0.18.2
gast                               0.3.3
gevent                             21.1.2
glob2                              0.7
gmpy2                              2.1.0b1
google-api-core                    1.25.1
google-api-python-client           1.12.8
google-apitools                    0.5.31
google-auth                        1.25.0
google-auth-httplib2               0.0.4
google-auth-oauthlib               0.4.2
google-cloud-bigquery              1.28.0
google-cloud-bigtable              1.6.1
google-cloud-build                 2.0.0
google-cloud-core                  1.6.0
google-cloud-datastore             1.15.3
google-cloud-dlp                   1.0.0
google-cloud-language              1.3.0
google-cloud-pubsub                1.7.0
google-cloud-spanner               1.19.1
google-cloud-storage               1.35.1
google-cloud-videointelligence     1.16.1
google-cloud-vision                1.0.0
google-crc32c                      1.1.2
google-pasta                       0.2.0
google-resumable-media             1.2.0
googleapis-common-protos           1.52.0
greenlet                           0.4.17
grpc-google-iam-v1                 0.12.3
grpcio                             1.35.0
grpcio-gcp                         0.2.2
h5py                               2.10.0
hdfs                               2.5.8
HeapDict                           1.0.1
helpdev                            0.7.1
html5lib                           1.1
httplib2                           0.17.4
idna                               2.10
imagecodecs                        2021.1.11
imageio                            2.9.0
imagesize                          1.2.0
importlib-metadata                 3.4.0
iniconfig                          1.1.1
intervaltree                       3.0.2
ipykernel                          5.3.4
ipython                            7.20.0
ipython-genutils                   0.2.0
ipywidgets                         7.6.3
isort                              5.6.4
itsdangerous                       1.1.0
jdcal                              1.4.1
jedi                               0.17.2
jeepney                            0.6.0
Jinja2                             2.11.3
joblib                             0.14.1
json5                              0.9.5
jsonschema                         3.2.0
jupyter                            1.0.0
jupyter-client                     6.1.11
jupyter-console                    6.2.0
jupyter-core                       4.7.1
jupyter-server                     1.3.0
jupyterlab                         3.0.7
jupyterlab-pygments                0.1.2
jupyterlab-server                  2.2.0
jupyterlab-widgets                 1.0.0
Keras-Preprocessing                1.1.2
keras-tuner                        1.0.1
keyring                            22.0.1
kiwisolver                         1.3.1
kubernetes                         11.0.0
lazy-object-proxy                  1.4.3
libarchive-c                       2.9
libcst                             0.3.16
llvmlite                           0.35.0
locket                             0.2.0
lxml                               4.6.2
Markdown                           3.3.3
MarkupSafe                         1.1.1
matplotlib                         3.3.4
mccabe                             0.6.1
mistune                            0.8.4
mkl-fft                            1.2.0
mkl-random                         1.2.0
mkl-service                        2.3.0
ml-metadata                        0.26.0
ml-pipelines-sdk                   0.26.1
mock                               2.0.0
more-itertools                     8.7.0
mpmath                             1.1.0
msgpack                            1.0.2
multipledispatch                   0.6.0
mypy-extensions                    0.4.3
nbclassic                          0.2.6
nbclient                           0.5.1
nbconvert                          6.0.7
nbformat                           5.1.2
nest-asyncio                       1.4.3
networkx                           2.5
nltk                               3.4.4
nose                               1.3.7
notebook                           6.2.0
numba                              0.52.0
numexpr                            2.7.2
numpy                              1.18.5
numpydoc                           1.1.0
oauth2client                       4.1.3
oauthlib                           3.1.0
olefile                            0.46
openpyxl                           3.0.6
opt-einsum                         3.3.0
packaging                          20.9
pandas                             1.2.1
pandocfilters                      1.4.2
parso                              0.7.0
partd                              1.1.0
path                               15.1.0
pathlib2                           2.3.5
pathspec                           0.8.1
pathtools                          0.1.2
patsy                              0.5.1
pbr                                5.5.1
pep8                               1.7.1
pexpect                            4.8.0
pickleshare                        0.7.5
Pillow                             8.1.0
pip                                21.0.1
pkginfo                            1.7.0
pluggy                             0.13.1
ply                                3.11
pooch                              1.3.0
prometheus-client                  0.9.0
promise                            2.3
prompt-toolkit                     3.0.14
proto-plus                         1.13.0
protobuf                           3.14.0
psutil                             5.8.0
ptyprocess                         0.7.0
py                                 1.10.0
pyarrow                            0.17.1
pyasn1                             0.4.8
pyasn1-modules                     0.2.8
pycodestyle                        2.6.0
pycosat                            0.6.3
pycparser                          2.20
pycrypto                           2.6.1
pycurl                             7.43.0.6
pydocstyle                         5.1.1
pydot                              1.4.1
pyerfa                             1.7.2
pyflakes                           2.2.0
Pygments                           2.7.4
pylint                             2.6.0
pyls-black                         0.4.6
pyls-spyder                        0.3.0
pymongo                            3.11.3
pyodbc                             4.0.30
pyOpenSSL                          20.0.1
pyparsing                          2.4.7
PyQt5                              5.12.3
PyQt5-sip                          4.19.18
PyQtChart                          5.12
PyQtWebEngine                      5.12.1
pyrsistent                         0.17.3
PySocks                            1.7.1
pytest                             6.2.2
python-dateutil                    2.8.1
python-jsonrpc-server              0.4.0
python-language-server             0.36.2
pytz                               2021.1
PyWavelets                         1.1.1
pyxdg                              0.26
PyYAML                             5.4.1
pyzmq                              22.0.1
QDarkStyle                         2.8.1
QtAwesome                          1.0.2
qtconsole                          5.0.2
QtPy                               1.9.0
regex                              2020.11.13
requests                           2.25.1
requests-oauthlib                  1.3.0
rope                               0.18.0
rsa                                4.7
Rtree                              0.9.7
ruamel-yaml-conda                  0.15.80
scikit-image                       0.18.1
scikit-learn                       0.24.1
scipy                              1.5.3
seaborn                            0.11.1
SecretStorage                      3.3.0
Send2Trash                         1.5.0
setuptools                         49.6.0.post20210108
simplegeneric                      0.8.1
singledispatch                     3.4.0.3
sip                                4.19.24
six                                1.15.0
sniffio                            1.2.0
snowballstemmer                    2.1.0
sortedcollections                  2.1.0
sortedcontainers                   2.3.0
soupsieve                          2.0.1
Sphinx                             3.4.3
sphinxcontrib-applehelp            1.0.2
sphinxcontrib-devhelp              1.0.2
sphinxcontrib-htmlhelp             1.0.3
sphinxcontrib-jsmath               1.0.1
sphinxcontrib-qthelp               1.0.3
sphinxcontrib-serializinghtml      1.1.4
sphinxcontrib-websupport           1.2.4
spyder                             4.2.1
spyder-kernels                     1.10.1
SQLAlchemy                         1.3.23
statsmodels                        0.12.2
sympy                              1.7.1
tables                             3.6.1
tabulate                           0.8.7
tblib                              1.6.0
tensorboard                        2.4.1
tensorboard-plugin-wit             1.8.0
tensorflow                         2.3.2
tensorflow-cloud                   0.1.12
tensorflow-data-validation         0.26.0
tensorflow-datasets                3.0.0
tensorflow-estimator               2.3.0
tensorflow-hub                     0.9.0
tensorflow-metadata                0.26.0
tensorflow-model-analysis          0.26.0
tensorflow-serving-api             2.3.0
tensorflow-transform               0.26.0
termcolor                          1.1.0
terminado                          0.9.2
terminaltables                     3.1.0
testpath                           0.4.4
textdistance                       4.2.1
tfx                                0.26.1
tfx-bsl                            0.26.1
threadpoolctl                      2.1.0
three-merge                        0.1.1
tifffile                           2021.2.1
timeloop                           1.0.2
toml                               0.10.2
toolz                              0.11.1
tornado                            6.1
tqdm                               4.56.0
traitlets                          5.0.5
typed-ast                          1.4.1
typing-extensions                  3.7.4.3
typing-inspect                     0.6.0
ujson                              4.0.2
unicodecsv                         0.14.1
uritemplate                        3.0.1
urllib3                            1.26.3
watchdog                           1.0.2
wcwidth                            0.2.5
webencodings                       0.5.1
websocket-client                   0.57.0
Werkzeug                           1.0.1
wheel                              0.36.2
widgetsnbextension                 3.5.1
wrapt                              1.12.1
wurlitzer                          2.0.1
xlrd                               2.0.1
XlsxWriter                         1.3.7
xlwt                               1.3.0
yapf                               0.30.0
zict                               2.0.0
zipp                               3.4.0
zope.event                         4.5.0
zope.interface                     5.2.0

arghyaganguly avatar Feb 08 '21 14:02 arghyaganguly

I just wanted to update you that the written behavior is true for all macs. There are so many problems during installation and running tfx on mac + pip (version 0.27 is not even possible to run without crashing with a segmentation fault) that we have the impression that it is never tested in this constellation. For us, it was too risky to use tfx in its current stage since many engineers and data scientists will depend on our pipeline architecture so that we had to move on. Keep the good work. We hope to see a more stable version in the future. But the current version (with pip install and on macOS) is for us a no-go.

macoun avatar Mar 05 '21 07:03 macoun

@macoun Could you please check the noop PR merged and let us know if this can be closed.Thanks

UsharaniPagadala avatar Oct 19 '21 07:10 UsharaniPagadala

Hi @macoun

Could you please give us the status of this issue? Thank you!

pindinagesh avatar May 31 '22 10:05 pindinagesh

The 0.26.1 version of TFX is very old, and was still beta, so my first recommendation would be to try this with the current version, which is 1.8.0. Second, the reason for using force_tf_compat_v1 is to use Transform with version 1.X TensorFlow code, which is also very old. Could you update to TensorFlow 2.X?

rcrowe-google avatar May 31 '22 20:05 rcrowe-google

@macoun As mentioned above, please update to tensorflow 2.x and tfx 2.8.0. I am closing this issue as it has been inactive for months. Please respond to this above comment and will reopen the issue. Thanks!!!

gowthamkpr avatar Sep 27 '22 21:09 gowthamkpr

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 27 '22 21:09 google-ml-butler[bot]