json2avro icon indicating copy to clipboard operation
json2avro copied to clipboard

Multiple input files support

Open mariusmilea opened this issue 10 years ago • 1 comments

We're heavily using this tool to convert a couple of GBs of JSON files into AVRO every day. It was useful for me to have this tool to accept more JSON files as input, hence my commit here. If you need to convert a batch of json files, originally, json2avro could only be used like this:

cat file1.json file2.json file3.json | json2avro -S schema_file output.avro

With this patch, json2avro can also be used like this:

json2avro -S schema_files file1.json file2.json file3.json output.avro

eliminating thus the cat utility or any other utility used to concatenate the input files. The performance improvement is between 1 and 1.5 seconds for a batch of 160MB of JSON files, when running json2avro with multiple input files.

mariusmilea avatar Dec 03 '14 14:12 mariusmilea

@spil-marius Sorry - I somehow never saw this pull request until now. How has this been working for you, do you think this is ok to merge?

grisha avatar Jan 03 '16 13:01 grisha