Plume icon indicating copy to clipboard operation
Plume copied to clipboard

Need to read sequence files nicely

Open tdunning opened this issue 15 years ago • 4 comments

I would like to be able to specify a two writable classes to receive the contents of a sequence file.

This is similar to the way that Avro naturally reads when we use strings() or integers() or such. There would be new kind of PType that describes what writables are being used to do the reading.

tdunning avatar Nov 19 '10 00:11 tdunning

Have you seen the support for SequenceFiles added to Avro 1.4.1?

https://issues.apache.org/jira/browse/AVRO-662

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/SequenceFileInputFormat.html

This uses reflection to infer the Avro schema used when writing the Writables, so non-static, non-transient fields are recursively written, which works well for most Writables.

This permits one to have SequenceFiles as inputs but not as outputs. Does that suffice?

cutting avatar Nov 19 '10 18:11 cutting

It probably doesn't suffice to just read data, at least over the long term, but this could get me much further down the road in the short term.

tdunning avatar Nov 19 '10 19:11 tdunning

I tried this. What I did was to put a reference to a writable into an Avro schema thus:

{ "type": "record", "name": "foo", "fields": [ {"name": "a", "type": "long"}, {"name": "b", "type": {"type":"FooWritable"}} ]}

Avro's schema parser barfed on this. It would nice if a schema like this could be used on a SequenceFile with long keys and FooWritable values.

tdunning avatar Dec 04 '10 23:12 tdunning

The intent is that FooWritable's reflected schema would be inlined above. Do you have a code example/test case of what you're trying to do?

cutting avatar Dec 06 '10 18:12 cutting