request-for-implementation
request-for-implementation copied to clipboard
Minimalist YAML parser with support for borrowed string slices
In https://github.com/dtolnay/serde-yaml/issues/94 I would like to move serde-yaml off of yaml-rust as a YAML backend and to a much simpler backend with support for a minimum viable subset of YAML syntax.
I think the minimal API for this would be:
pub enum Yaml<'a> {
// strings, booleans, numbers, nulls, all treated the same
Scalar(&'a str),
// flow style like `[x, x, x]`
// or block style like:
// - x
// - x
Sequence(Vec<Yaml<'a>>),
// flow style like `{x: X, x: X}`
// or block style like:
// x: X
// x: X
Mapping(Vec<Entry<'a>>),
}
pub struct Entry<'a> {
key: Yaml<'a>,
value: Yaml<'a>,
}
pub fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>;
The long tail of seldom used YAML features need not be supported. That means no tags, anchors, aliases, countless distinct string types, ...
In python world there's strictyaml with similar goal, if you want to check what elements are needed
Thanks! That looks like a super nicely done library with similar motivations as this project.
@dtolnay I think I would like to take a crack at this
Just checking in -- any progress on this @jakeschurch? Is there anything you are stuck on?
Thanks for checking in @dtolnay -- so far things are going well - will def. Check in if I have any questions or problems
Sounds good. I noticed that you forked serde-yaml and made some big changes to its Value type. I would recommend sticking to a 100% self contained fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>
instead -- serde_yaml::Value is a frontend type and shouldn't need to change at all when we later swap backends.
OK. Sounds like I need to read a little more into the code.
Less into the code. :wink:
Nothing in serde_yaml is relevant to implementing this function.
Sorry! Just meant what parts of the code now hook up to yaml-rust so I can have the signatures in mind implementing the back-end
hey @dtolnay just wanted to let you know that I haven't gotten around to implementing this yet cause I'm super busy with work - whatever you want to do, let me know -- sorry to keep you waiting
I believe it would be nice to support the additional feature using feature gates. I will be trying this as it is a blocker for me to implement yaml2json.
I added my implementation in https://github.com/pickfire/mini-yaml, very experimental. Currently can parse certain flow scalar, flow/block sequence, flow mapping and block implicit mapping. Still figuring out how to parse implicit flow mapping and maybe later need to check the spec again, maybe later can add serialization and deserialization.
Nice, looks like this is on the right track. When you wrap up the initial set of features it would be good to write a tool to programmatically partition the yaml-test-suite based on ones that are supported by your library and ones that are not supported, and check those into your repo as two separate directories. That will make it easy to scan through and tell whether the set of supported cases is sufficient for what someone needs, or if they require some feature that is in the unsupported set.
@dtolnay Thanks a lot for sharing that link. By the way, do you think I should implement it in a function way instead of using an iterator like what yaml-rust does? Maybe it might help in streaming serialization/deserialization?
I recommend not doing stream style. Stream support would be important for a lossy deserializer, for example one that deserializes 1.00
as an f64 by default. Like if the input document is:
---
n: 1.00
then a non-streaming lossy deserializer interpreting this as yaml::Map(map! { yaml::String("n") => yaml::Number(1.0f64) })
would be problematic because the caller may ultimately need the data as struct Input { n: String }
in which the correct deserialization would have n = "1.00".
Since the design in this thread is specced to produce lossless &'a str for all yaml scalars, there shouldn't need to be a stream API.
@dtolnay Looks a lot harder than I think, I probably will not be continuing the project.
If anyone is interested can take a look at https://github.com/pickfire/mini-yaml, the implementation is a bit bad but the main issue is support for implicit block mapping and block and flow scalar. I believe supporting strictyaml would be better than yaml since it's too broad for start.
I will try to create a crate based on the StrictYAML syntax.
Sounds good. Thanks! The API design in https://github.com/dtolnay/request-for-implementation/issues/9#issue-401158316 should work just as well for StrictYAML.
To make it simpler to write a parser, I thought about only implementing a super simple and minimalistic version of yaml.
These would be all the possibilities:
key1: 'value'
key2: "value"
key3: value
list:
- one
- two
list2: [foo, bar]
object:
some: "thing"
foo: bar
object2: {a: one, b: two}
value
can also be a number
Would this be okay?
Yes that sounds fine. If someone needs more syntax support they can extend your library.
Should the library support serializing and deserializing?
@Stupremee Serializing and deserializing could be added later but it should be designed with that in mind, taking a look at serde_json and serde_cbor would help but the design of yaml-rust is way too different from those.
@Stupremee—Personally for this issue I really only care about one function.
fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>
Once that is in a good state, it would be reasonable to add the reverse as well. That would be as simple as impl Display for Yaml<'_>
.
@dtolnay I noticed this has been sitting for a while, so I decided to pick this up. The repository with my work up to this point is here. Support for the bare-bones syntax you described is already implemented, and I do have a basic level of testing (~40 unit tests, but I plan to add more). I haven't published the crate yet, as I feel it needs a bit more polish and more thorough testing first, but tentatively a release doesn't seem too far off. If you have any spare time, feedback on the current state of the crate would be wonderful.
to a much simpler backend with support for a minimum viable subset of YAML syntax
How would someone that needs these "non-minimalist features" do?
@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.
@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.
I understand that very well, but if this is supposed to replace https://github.com/chyh1990/yaml-rust for https://github.com/dtolnay/serde-yaml then it can be problematic.
@nathanwhit it looks like the feedback I filed on https://github.com/nathanwhit/minimal-yaml were mostly resolved. Any plan on what polish or testing are missing still before it would be released?
@dtolnay I'm comfortable releasing it in its current state. I wouldn't be too surprised if there are some edge cases being handled improperly, but there shouldn't be any major issues at this point. I've gone ahead and published the crate here.
@dtolnay how are we doing on this?