spyql icon indicating copy to clipboard operation
spyql copied to clipboard

Support reading and writing of JSON objects (not JSON lines)

Open dcmoura opened this issue 2 years ago • 1 comments

Currently SPyQL only allows to read and write JSON lines. Writing JSON arrays can be done using the dict_agg, aggregating everything into an array and writing an JSON with a single line.

The idea is to add an argument lines=True to json and orjson writers and processors. The processor should be able to handle single object files as well as arrays of objects or arrays of scalars. When lines is False the processor should load the full input into memory and then parse it. While this is not ideal, it is the most straightforward implementation. In addition, arrays of JSON shouldn't be used for large data, in that cases JSON lines should be used instead.

The writer should write an array of objects when lines is False.

dcmoura avatar Nov 17 '22 08:11 dcmoura

BTW, reading JSON arrays is done today by leveraging jq in the command-line:

jq `.[]` myfile.json | spyql "SELECT .aproperty FROM json"

dcmoura avatar Nov 17 '22 08:11 dcmoura