std_data_json
std_data_json copied to clipboard
small suggestions for enhancement
- better examples of how to do something useful using the stream parser with real data. you could use Twitter feed, or for example the metadata from Quandl: http://www.quandl.com/api/v2/datasets.json?query=*&source_code=SEC&per_page=300&page=1 I would submit my own use, but I am not that happy with it as it is too ad hoc.
- stream parser does not take the kind of range returned by byLine (and its faster versions) as an input (I presume it doesn't work with byChunk either). since working with large files one often won't want to read the whole thing in to memory at once, it would be helpful to have examples of how to handle this well. (mmfile seems to be slow in many cases).
- wrappers to do the following:
- take JSON stream as an input and return a new JSON stream filtered based on query criteria (for example filtering the Twitter stream by topic, or selecting only data for certain securities from the Quandl JSON)
- read JSON stream and populate an array of structs (similarly to what you have in vibe.d) using either built-in allocation or pre-allocated buffer
stream parser does not take the kind of range returned by byLine (and its faster versions) as an input (I presume it doesn't work with byChunk either). since working with large files one often won't want to read the whole thing in to memory at once, it would be helpful to have examples of how to handle this well. (mmfile seems to be slow in many cases).
parseJSONValue requires a range of characters, while byLine and byChunk are ranges of strings and ubyte arrays, respectively.
They can both be turned into continuous ranges of characters/ubytes with joiner, but there is currently a problem where some internal algorithms are marked @safe while their safety actually depends on the input range (at least byChunk has unsafe primitives), causing compilation failure. I meant to reply with this to your forum post, but I got sidetracked and forgot :)
take JSON stream as an input and return a new JSON stream filtered based on query criteria (for example filtering the Twitter stream by topic, or selecting only data for certain securities from the Quandl JSON)
Ideally you should be able to do this with existing generic algorithms in Phobos.
read JSON stream and populate an array of structs (similarly to what you have in vibe.d) using either built-in allocation or pre-allocated buffer
This is commonly requested but is beyond the scope of this library. I don't want to speak on @s-ludwig's behalf, but he has repeatedly said he wishes for serialization functionality to be built on top of this library.
I agree with that; a clear separation makes it more likely for this library to be accepted into Phobos, and less likely to overlap with Phobos' future serialization functionality (there is a std.serialization in the review queue).
Thanks for the thoughts Jakob. The internal algorithms marked @safe - do you mean in Phobos (not std.data.json)? So in any case if I write my own version of joiner, or of a chunked file with a continuous range of characters it should work?
Makes sense to start with the low level stuff and build on it. For the time being though std.data.json is obviously only a partial replacement for the functionality provided in vibe.d JSON.
The internal algorithms marked
@safe- do you mean in Phobos (not std.data.json)? So in any case if I write my own version of joiner, or of a chunked file with a continuous range of characters it should work?
Sorry for being unclear, I mean internal algorithms in std.data.json. It is a bug that they're marked @safe.
In this case, the memory-unsafety of File.byChunk is propagated by joiner through attribute inference, and while parseJSONValue should propagate the same, it instead errors. Whether or not File.byChunk (and File.byLine) present memory safe interfaces and should thus be @safe (or @trusted) is a relevant but separate issue.
Functions in templates (including member functions in type templates) have attribute inference. That means attributes like @safe are inferred from the instantiated body of the function. If the safety of a templated function depends on template arguments, then attribute inference should be relied on, as it allows the user to use unsafe instantiations while still getting the safety guarantee when using safe instantiations.
In other words, explicit attributes on generic algorithms should only be used if all possible instantiations provide that guarantee. The explicit attribute is valuable documentation that attribute inference does not provide, but possibly at the cost of guaranteeing that future changes to the implementation won't break the guarantee, or it could break user code.
Makes sense to start with the low level stuff and build on it. For the time being though std.data.json is obviously only a partial replacement for the functionality provided in vibe.d JSON.
Indeed, but it is an important step, and luckily, completely eclipses the present std.json.
Thank you for taking the time to write such a thorough and helpful reply - I truly appreciate it.
The benchmarks are interesting, but real world use is something else often. I have a 2 gig file of metadata from Quandl. So until I get round to putting it in a database, I want to lookup the details for a particular ticker, which means going through the whole thing. Very impressive performance for std.data.json. Only thing is a small wrapper would be nice (the gap between good enough and high quality is often small in terms of usefulness but large in terms of work), and some more examples. I will contribute something if I get time.
Laeeth
Sent from my iPad
On 6 May 2015, at 15:41, JakobOvrum [email protected] wrote:
The internal algorithms marked @safe - do you mean in Phobos (not std.data.json)? So in any case if I write my own version of joiner, or of a chunked file with a continuous range of characters it should work?
Sorry for being unclear, I mean internal algorithms in std.data.json. It is a bug that they're marked @safe.
In this case, the memory-unsafety of File.byChunk is propagated by joiner through attribute inference, and while parseJSONValue should propagate the same, it instead errors. Whether or not File.byChunk (and File.byLine) present memory safe interfaces and should thus be @safe (or @trusted) is a relevant but separate issue.
Functions in templates (including member functions in type templates) have attribute inference. That means attributes like @safe are inferred from the instantiated body of the function. If the safety of a templated function depends on template arguments, then attribute inference should be relied on, as it allows the user to use unsafe instantiations while still getting the safety guarantee when using safe instantiations.
In other words, explicit attributes on generic algorithms should only be used if all possible instantiations provide that guarantee. The explicit attribute is valuable documentation that attribute inference does not provide, but possibly at the cost of guaranteeing that future changes to the implementation won't break the guarantee, or it could break user code.
Makes sense to start with the low level stuff and build on it. For the time being though std.data.json is obviously only a partial replacement for the functionality provided in vibe.d JSON.
Indeed, but it is an important step, and luckily, completely eclipses the present std.json.
— Reply to this email directly or view it on GitHub.