Hadoop icon indicating copy to clipboard operation
Hadoop copied to clipboard

reading custom Serialization for sequencefiles

Open jayunit100 opened this issue 11 years ago • 0 comments

Serialized HDFS files can be tricky to read, because sometimes they are

  • Compressed
  • Encoded in a non Writable Sequence file format (thrift, avro,...)

I wonder if I can use this API to read thrift Sequence files in python ?

Clearly, the sequencefile.reader class : https://github.com/matteobertozzi/Hadoop/tree/master/python-hadoop/hadoop

Appears to use the base classes that are here could allow for implementation of a more advanced sequencefile reader, that handled reading custom serialization+hadoop formats.

I would potentially be able to work with you on implementation of this for thrift... feel free to contact me directly !

jayunit100 avatar Sep 25 '12 16:09 jayunit100