io
io copied to clipboard
Bug in Reading Compressed String Column in Parquet Dataset
I have a parquet dataset with a column consisting of serialized tf.Example protobufs. When I write this dataset and read without any compression I have no problems deserializing the protos. When I write the dataset with compression I get errors. On further inspection it's clear that TFIO does not read the correct strings from the compressed dataset.
A reproducible example can be found here: https://gist.github.com/cmgreen210/639ab8ea1102c22f67db60c95a8653f5