TIES-2.0 icon indicating copy to clipboard operation
TIES-2.0 copied to clipboard

DataLossError: corrupted record at 0 while reading tfrecord files

Open akshaydp1995 opened this issue 5 years ago • 4 comments

  1 import tensorflow as tf
  2 

----> 3 for example in tf.python_io.tf_record_iterator("ZI70YDAKXIT453SKIGZ8.tfrecord"): 4 result = tf.train.Example.FromString(example)

1 frames /usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/tf_record.py in tf_record_iterator(path, options) 179 while True: 180 try: --> 181 reader.GetNext() 182 except errors.OutOfRangeError: 183 break

/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py in GetNext(self) 795 796 def GetNext(self): --> 797 return _pywrap_tensorflow_internal.PyRecordReader_GetNext(self) 798 799 def record(self):

DataLossError: corrupted record at 0

akshaydp1995 avatar Jul 11 '19 16:07 akshaydp1995

Did you get this error from downloading via GDrive? If yes, there is a possibility that it was corrupted while downloading. Try to use the synthetic data generation github.com/hassan-mahmood/Structural_Analysis

nggih avatar Jul 17 '19 06:07 nggih

You can use the following code to find the corrupted tfrecords:

import tensorflow as tf
import glob
total_images = 0
train_files = sorted(glob.glob('~/*.tfrecord'))
compression = tf.python_io.TFRecordCompressionType.GZIP
print("validation started!")
for idx, file in enumerate(train_files):
    try:
        total_images += sum([1 for _ in tf.io.tf_record_iterator(file, tf.python_io.TFRecordOptions(compression))])
        print("{}: {} is ok".format(idx, file))
    except Exception as e:
        print("{}: {} is corrupted".format(idx, file))
        print(e)

Sky9222 avatar Feb 12 '20 00:02 Sky9222

Did you get this error from downloading via GDrive? If yes, there is a possibility that it was corrupted while downloading. Try to use the synthetic data generation github.com/hassan-mahmood/Structural_Analysis

yes, the generated data seems to be gzip compressed though they appear to be tfrecord files. For generated data, there is no need to modify the data reader too much

EmperorKaiser avatar Feb 07 '21 14:02 EmperorKaiser

For TensorFlow v2+, the following modifications to your code need to be made:

import tensorflow as tf
import glob
total_images = 0
train_files = sorted(glob.glob('~/*.tfrecord'))
compression = tf.compat.v1.io.TFRecordCompressionType.GZIP
print("validation started!")
for idx, file in enumerate(train_files):
    try:
        total_images += sum([1 for _ in tf.io.tf_record_iterator(file, tf.compat.v1.io.TFRecordOptions(compression))])
        print("{}: {} is ok".format(idx, file))
    except Exception as e:
        print("{}: {} is corrupted".format(idx, file))
        print(e)

but I'm not too sure what this does other than confirm the obvious (file corrupted, already assumed).

jonathanloganmoran avatar Aug 09 '22 00:08 jonathanloganmoran