dgraph icon indicating copy to clipboard operation
dgraph copied to clipboard

Add support for json-lines in bulk loader

Open dmsolow opened this issue 5 years ago • 2 comments

Json-lines (http://jsonlines.org/) is a commonly used format for storing a large number of JSON objects in a file. It's better than a single JSON array of objects because it makes it easy to read a file object by object without loading the entire thing into memory.

Popular big data processing frameworks like Apache Spark write JSON-lines natively (df.write.json("out.json") writes a JSON-lines file for each partition)

Support would probably be trivial to add for Dgraph and it would help people easily integrate Dgraph into existing ETL workflows.

dmsolow avatar Aug 15 '19 13:08 dmsolow

It doesn't seem like it would be too difficult to implement. We are gearing up for the 1.1 release so this issue most likely sit in the back-burner for a little while. If anybody else is interested in this feature, give a thumbs up to the issue so we can gauge interest and prioritize accordingly.

martinmr avatar Aug 15 '19 21:08 martinmr

Github issues have been deprecated. This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

minhaj-shakeel avatar Jul 20 '20 18:07 minhaj-shakeel