ecosystem
ecosystem copied to clipboard
Error when deserializing tfrecord's in TF 2.x: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices
The lib doesn't seem to be working in the context of TensorFlow 2.x.
My environment:
- AWS emr-5.31.0
- TF=2.1.0, Spark=3.0.1
- built the library with the following:
mvn versions:set -DnewVersion=1.15.0
mvn clean install -Dspark.version=3.0.1
The reading and the writing do not match. After persisting into AWS S3, I see that the serde is somehow mismatched, perhaps it's doing a TF 1.x compatible stuff and not TF 2.x ?
To reproduce:
- execute in Spark, with
--jars s3://<your jars location>/spark-tensorflow-connector_2.12-1.15.0.jar
- use a Bootstrap action in EMR to get boto3 installed on the cluster; this worked for me:
#!/bin/bash
pip3 install --user boto3
- the python tester program is attached
- I used the small movielens dataset (see the attached movies.csv and ratings.csv)
Output
>> Ratings: <class 'tensorflow.python.data.ops.readers.TFRecordDatasetV2'>; size=100004
>> Movies: <class 'tensorflow.python.data.ops.readers.TFRecordDatasetV2'>; size=9125
(Deserialization doesn't seem to be working)
********************************************************************************
(b'\ns\n\x11\n\x08movie_id\x12\x05\x1a\x03\n\x01\x01\n#\n\x0bmovie_title'
b'\x12\x14\n\x12\n\x10Toy Story (1995)\n9\n\x06genres\x12/\n-\n+Adventure|Anim'
b'ation|Children|Comedy|Fantasy')
********************************************************************************
(b'\n|\n\x10\n\x07user_id\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08movie_id'
b'\x12\x05\x1a\x03\n\x01\x1f\n)\n\x0bmovie_title\x12\x1a\n\x18\n\x16Dangerou'
b's Minds (1995)\n\x12\n\x06rating\x12\x08\x12\x06\n\x04\x00\x00 @\n\x16\n\tti'
b'mestamp\x12\t\x1a\x07\n\x05\xe8\xd0\x96\xd9\x04')
********************************************************************************
The error when running in the cluster:
Traceback (most recent call last): File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 300, in
main(sys.argv) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper return func(*args, **kwargs) File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 50, in main movies, test, train, unique_movie_titles, unique_user_ids = prepare_data(movies, ratings) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper return func(*args, **kwargs) File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in prepare_data ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]}) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1695, in map return MapDataset(self, map_func, preserve_cardinality=True) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4045, in init use_legacy_function=use_legacy_function) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in init self._function = wrapper_fn.get_concrete_function() File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2939, in get_concrete_function *args, **kwargs) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected graph_function, args, kwargs = self._maybe_define_function(args, kwargs) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function capture_by_value=self._capture_by_value), File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn ret = _wrapper_helper(*args) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper ret = autograph.tf_convert(func, ag_ctx)(*nested_args) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper return func(*args, **kwargs) File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]}) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 986, in _slice_helper _check_index(s) File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 865, in _check_index raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx)) TypeError: Only integers, slices ( :
), ellipsis (...
), tf.newaxis (None
) and scalar tf.int32/tf.int64 tensors are valid indices, got 'movie_title'