json2avro
json2avro copied to clipboard
Multiple input files support
We're heavily using this tool to convert a couple of GBs of JSON files into AVRO every day. It was useful for me to have this tool to accept more JSON files as input, hence my commit here. If you need to convert a batch of json files, originally, json2avro could only be used like this:
cat file1.json file2.json file3.json | json2avro -S schema_file output.avro
With this patch, json2avro can also be used like this:
json2avro -S schema_files file1.json file2.json file3.json output.avro
eliminating thus the cat utility or any other utility used to concatenate the input files. The performance improvement is between 1 and 1.5 seconds for a batch of 160MB of JSON files, when running json2avro with multiple input files.
@spil-marius Sorry - I somehow never saw this pull request until now. How has this been working for you, do you think this is ok to merge?