request-for-implementation icon indicating copy to clipboard operation
request-for-implementation copied to clipboard

Minimalist YAML parser with support for borrowed string slices

Open dtolnay opened this issue 6 years ago • 31 comments

In https://github.com/dtolnay/serde-yaml/issues/94 I would like to move serde-yaml off of yaml-rust as a YAML backend and to a much simpler backend with support for a minimum viable subset of YAML syntax.

I think the minimal API for this would be:

pub enum Yaml<'a> {
    // strings, booleans, numbers, nulls, all treated the same
    Scalar(&'a str),

    // flow style like `[x, x, x]`
    // or block style like:
    //     - x
    //     - x
    Sequence(Vec<Yaml<'a>>),

    // flow style like `{x: X, x: X}`
    // or block style like:
    //     x: X
    //     x: X
    Mapping(Vec<Entry<'a>>),
}

pub struct Entry<'a> {
    key: Yaml<'a>,
    value: Yaml<'a>,
}

pub fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>;

The long tail of seldom used YAML features need not be supported. That means no tags, anchors, aliases, countless distinct string types, ...

dtolnay avatar Jan 21 '19 00:01 dtolnay

In python world there's strictyaml with similar goal, if you want to check what elements are needed

ghost avatar Jan 22 '19 20:01 ghost

Thanks! That looks like a super nicely done library with similar motivations as this project.

dtolnay avatar Jan 24 '19 03:01 dtolnay

@dtolnay I think I would like to take a crack at this

jakeschurch avatar Feb 06 '19 02:02 jakeschurch

Just checking in -- any progress on this @jakeschurch? Is there anything you are stuck on?

dtolnay avatar Feb 17 '19 21:02 dtolnay

Thanks for checking in @dtolnay -- so far things are going well - will def. Check in if I have any questions or problems

jakeschurch avatar Feb 17 '19 23:02 jakeschurch

Sounds good. I noticed that you forked serde-yaml and made some big changes to its Value type. I would recommend sticking to a 100% self contained fn parse<'a>(input: &'a str) -> Result<Yaml<'a>> instead -- serde_yaml::Value is a frontend type and shouldn't need to change at all when we later swap backends.

dtolnay avatar Feb 17 '19 23:02 dtolnay

OK. Sounds like I need to read a little more into the code.

jakeschurch avatar Feb 18 '19 20:02 jakeschurch

Less into the code. :wink:

Nothing in serde_yaml is relevant to implementing this function.

dtolnay avatar Feb 18 '19 21:02 dtolnay

Sorry! Just meant what parts of the code now hook up to yaml-rust so I can have the signatures in mind implementing the back-end

jakeschurch avatar Feb 20 '19 22:02 jakeschurch

hey @dtolnay just wanted to let you know that I haven't gotten around to implementing this yet cause I'm super busy with work - whatever you want to do, let me know -- sorry to keep you waiting

jakeschurch avatar Feb 28 '19 05:02 jakeschurch

I believe it would be nice to support the additional feature using feature gates. I will be trying this as it is a blocker for me to implement yaml2json.

pickfire avatar May 13 '19 01:05 pickfire

I added my implementation in https://github.com/pickfire/mini-yaml, very experimental. Currently can parse certain flow scalar, flow/block sequence, flow mapping and block implicit mapping. Still figuring out how to parse implicit flow mapping and maybe later need to check the spec again, maybe later can add serialization and deserialization.

pickfire avatar Jun 12 '19 14:06 pickfire

Nice, looks like this is on the right track. When you wrap up the initial set of features it would be good to write a tool to programmatically partition the yaml-test-suite based on ones that are supported by your library and ones that are not supported, and check those into your repo as two separate directories. That will make it easy to scan through and tell whether the set of supported cases is sufficient for what someone needs, or if they require some feature that is in the unsupported set.

dtolnay avatar Jun 12 '19 17:06 dtolnay

@dtolnay Thanks a lot for sharing that link. By the way, do you think I should implement it in a function way instead of using an iterator like what yaml-rust does? Maybe it might help in streaming serialization/deserialization?

pickfire avatar Jun 14 '19 01:06 pickfire

I recommend not doing stream style. Stream support would be important for a lossy deserializer, for example one that deserializes 1.00 as an f64 by default. Like if the input document is:

---
n: 1.00

then a non-streaming lossy deserializer interpreting this as yaml::Map(map! { yaml::String("n") => yaml::Number(1.0f64) }) would be problematic because the caller may ultimately need the data as struct Input { n: String } in which the correct deserialization would have n = "1.00".

Since the design in this thread is specced to produce lossless &'a str for all yaml scalars, there shouldn't need to be a stream API.

dtolnay avatar Jun 14 '19 06:06 dtolnay

@dtolnay Looks a lot harder than I think, I probably will not be continuing the project.

If anyone is interested can take a look at https://github.com/pickfire/mini-yaml, the implementation is a bit bad but the main issue is support for implicit block mapping and block and flow scalar. I believe supporting strictyaml would be better than yaml since it's too broad for start.

pickfire avatar Sep 15 '19 15:09 pickfire

I will try to create a crate based on the StrictYAML syntax.

Stupremee avatar Nov 02 '19 20:11 Stupremee

Sounds good. Thanks! The API design in https://github.com/dtolnay/request-for-implementation/issues/9#issue-401158316 should work just as well for StrictYAML.

dtolnay avatar Nov 02 '19 20:11 dtolnay

To make it simpler to write a parser, I thought about only implementing a super simple and minimalistic version of yaml.

These would be all the possibilities:

key1: 'value'
key2: "value"
key3: value
list:
  - one
  - two
list2: [foo, bar]
object:
  some: "thing"
  foo: bar
object2: {a: one, b: two}

value can also be a number Would this be okay?

Stupremee avatar Nov 02 '19 22:11 Stupremee

Yes that sounds fine. If someone needs more syntax support they can extend your library.

dtolnay avatar Nov 02 '19 22:11 dtolnay

Should the library support serializing and deserializing?

Stupremee avatar Nov 06 '19 06:11 Stupremee

@Stupremee Serializing and deserializing could be added later but it should be designed with that in mind, taking a look at serde_json and serde_cbor would help but the design of yaml-rust is way too different from those.

pickfire avatar Nov 06 '19 17:11 pickfire

@Stupremee⁠—Personally for this issue I really only care about one function.

fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>

Once that is in a good state, it would be reasonable to add the reverse as well. That would be as simple as impl Display for Yaml<'_>.

dtolnay avatar Nov 06 '19 19:11 dtolnay

@dtolnay I noticed this has been sitting for a while, so I decided to pick this up. The repository with my work up to this point is here. Support for the bare-bones syntax you described is already implemented, and I do have a basic level of testing (~40 unit tests, but I plan to add more). I haven't published the crate yet, as I feel it needs a bit more polish and more thorough testing first, but tentatively a release doesn't seem too far off. If you have any spare time, feedback on the current state of the crate would be wonderful.

nathanwhit avatar Feb 11 '20 18:02 nathanwhit

to a much simpler backend with support for a minimum viable subset of YAML syntax

How would someone that needs these "non-minimalist features" do?

NilsIrl avatar Feb 14 '20 12:02 NilsIrl

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

pickfire avatar Feb 14 '20 16:02 pickfire

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

I understand that very well, but if this is supposed to replace https://github.com/chyh1990/yaml-rust for https://github.com/dtolnay/serde-yaml then it can be problematic.

NilsIrl avatar Feb 14 '20 16:02 NilsIrl

@nathanwhit it looks like the feedback I filed on https://github.com/nathanwhit/minimal-yaml were mostly resolved. Any plan on what polish or testing are missing still before it would be released?

dtolnay avatar Jul 05 '20 11:07 dtolnay

@dtolnay I'm comfortable releasing it in its current state. I wouldn't be too surprised if there are some edge cases being handled improperly, but there shouldn't be any major issues at this point. I've gone ahead and published the crate here.

nathanwhit avatar Jul 05 '20 21:07 nathanwhit

@dtolnay how are we doing on this?

sanbox-irl avatar Dec 22 '20 00:12 sanbox-irl