io icon indicating copy to clipboard operation
io copied to clipboard

Bug in Reading Compressed String Column in Parquet Dataset

Open cmgreen210 opened this issue 11 months ago • 2 comments

I have a parquet dataset with a column consisting of serialized tf.Example protobufs. When I write this dataset and read without any compression I have no problems deserializing the protos. When I write the dataset with compression I get errors. On further inspection it's clear that TFIO does not read the correct strings from the compressed dataset.

A reproducible example can be found here: https://gist.github.com/cmgreen210/639ab8ea1102c22f67db60c95a8653f5

cmgreen210 avatar Mar 15 '24 10:03 cmgreen210