dataframe
dataframe copied to clipboard
Enhancement: Support new-line delimited JSON format/JSON lines
Thank you for the great project. It looks very promising to me.
I currently use val df = DataFrame.readJsonStr(File("foo.ndjson").readLines().joinToString(",", "[", "]"))
, to read new-line delimited JSON files, which works quite well. However, it would be much more convinient if the API would offer such a function directly. It would be also nice if it would work directly on InputStreams, because readLines()
is already reading the entire file under the hood.
Hi! Can you provide a small sample of such JSON?
https://codebeautify.org/json-decode-online
file is not json spec. is Json have "new-line delimited" spec variant? You try to read all file and convert it in memory. Its very huge and slow. Json parser read file by many smal parts (buffer size).
Yes, I'm aware, that is true. My file is not valid JSON, however, this format is commonly used in BigData environments. The specification is available here: http://ndjson.org/