ron icon indicating copy to clipboard operation
ron copied to clipboard

Add Value::Struct variant and preserve struct names and formatting

Open cswinter opened this issue 4 years ago • 14 comments
trafficstars

This change adds a ~dreadful hack~creative workaround that allows us to distinguish between structs and maps during parsing and preserve struct names. It does so by abusing the size_hint method of the MapAccessor as a side channel that exposes information about the struct name and whether we are visiting a struct or a map. I am somewhat confident that, within the context of the current implementation, this is sound and won't unexpectedly blow up in our face. The CommaSeparated struct with the unusual size_hint method is passed directly to the ValueVisitor and drained before we return control flow to any serde code, so anything else that might call the size_hint method will just receive None values. I haven't confirmed this, but with some luck, size_hint will also get inlined after monomorphization and optimized into something efficient.

This makes progress towards https://github.com/ron-rs/ron/issues/122, fixes https://github.com/ron-rs/ron/issues/189, and fixes https://github.com/ron-rs/ron/issues/264.

This change has been arrived at by sheer power of will rather than any kind of understanding of serde so there may be a better solution. There are still some smaller issues with the PR, before fixing those I first wanted to check whether this has any hope of getting merged.

cswinter avatar Oct 26 '21 15:10 cswinter

Another hitch I haven't figured out yet is that the serde serialize_struct method expects a name: &'static str which makes it unusable for Value::Struct variants with dynamic names: https://github.com/serde-rs/serde/issues/2043

cswinter avatar Oct 27 '21 04:10 cswinter

I suppose it's easy enough to adapt the Serialize::serialize implementation to take a concrete ron::ser::Serializer which can actually have a more permissive lifetime and call that in to_string/to_pretty_string. It does mean that we couldn't actually correctly implement the serde::Serialize trait on Value, not sure much we care about that.

cswinter avatar Oct 27 '21 05:10 cswinter

Hmm ok actually a little more involved because of the recursion on the Compounds. Seems pretty manageable but not sure yet what the right approach is.

cswinter avatar Oct 27 '21 05:10 cswinter

Ok I think I got it. KeepingValue Serialize seems important if you wanted to e.g. convert ron to yaml, but we can still keep the existing implementation and just serialize Struct as map which retains the existing behavior and is simply the best we can do. And then to_string_pretty can just use an enhanced version of serialize with some moderate amount of code deduplication.

cswinter avatar Oct 27 '21 06:10 cswinter

Well, I got it working but since to_string is fully generic we can't actually do something custom for Value, so the only option I see is to have a separate version of the methods just for Value. Which could still be useful but somewhat unsatisfying, not sure if there's a way to salvage this.

cswinter avatar Oct 27 '21 16:10 cswinter

Sorry, I've been punting on reviewing this. @torkleyy is this something you'd be comfortable reviewing?

kvark avatar Nov 02 '21 03:11 kvark

So I did get this to work and am using it in https://github.com/cswinter/pyron to (de)serialize Python dataclasses as ron, but there are still some issues with this that I'm not sure yet if they can be fixed. Both the ValueVisitor used in the Deserialize implementation of Value, and the Deserializer fail to conform to the expected serde interface and are public which might cause some issues if you use them in combination with a different Deserialize/Deserializer. Particularly, if you try to deserialize something else (e.g. yaml) into a ron::Value, the ValueVisitor won't behave correctly and do weird things if it receives size hints. We could possibly fix this by further changing the ron Deserializer to emit a sentinel value of usize::MAX as the first size_hint (which would never be emitted by a valid Deserializer) and allows the ValueVisitor to work normally if it doesn't see this sentinel. However, this would now break the Deserializer, and actually, my current implementation might already be broken for deserializing into structs. There could still be some way around this but I'm not sure yet if that's the way to go. It might be better to just write a new parser from scratch that is not tied to serde, which I'm considering anyway since it enables some other features that are not currently possible.

The serialization implementation that allows preserving struct names/tuples is in better shape. It requires some amount of code duplication, and unfortunately needs an explicit call to a new Value::to_string/to_pretty_string method so we can call our custom serialize method that doesn't quite work like the serde one, but I'm not aware of any glaring soundness issues. It's a lot less useful with the corresponding deserialization code of course.

cswinter avatar Nov 02 '21 03:11 cswinter

For now, my main use case for this is https://github.com/cswinter/pyron, which doesn't use ron in a way that would run into the remaining issues. I would like to fix this properly at some point but still need to decide on the right approach and exactly what features I want so I'm planning to just let this stew for a while. The main approaches I'm considering:

  • come up with one more clever idea to make this hack work
  • implement custom deserializer (and maybe also pretty printer) which isn't tied to serde
  • get serde to add some new methods for us that allow an implementation without hacks (I think this could be done in a way that doesn't break existing methods by adding alternative variants with dynamic lifetime etc. which have a default implementation that returns a specific error. Existing serializer would just never call these, and our serializer could call these to use the extra functionality when available, or revert to the current behavior for Deserializer/Deserialize implementations that don't support this functionality.)

If anyone has any other good ideas or opinions on how this should be approached I'd be glad to hear about them.

cswinter avatar Nov 02 '21 03:11 cswinter

Sorry, I've been punting on reviewing this. @torkleyy is this something you'd be comfortable reviewing?

I might be able to next week, feel free to ping me if I forget.

torkleyy avatar Nov 02 '21 08:11 torkleyy

Issue has had no activity in the last 60 days and is going to be closed in 7 days if no further activity occurs

github-actions[bot] avatar Jan 01 '22 15:01 github-actions[bot]

@torkleyy ping

kvark avatar Jan 06 '22 04:01 kvark

I've been thinking about a more AST-like Value type again and this is a sketch of what a NewValue might look like:

pub enum NewValue<S: std::borrow::Borrow<str>> {
    Unit,
    Bool(bool),
    Number(Number),
    Char(char),
    String(String),
    Bytes(Vec<u8>),
    List(Vec<NewValue<S>>),
    Map(NewMap<S>),
    NamedUnit {
        name: S,
    },
    Newtype(Box<NewValue<S>>),
    NamedNewtype {
        name: S,
        inner: Box<NewValue<S>>,
    },
    Tuple(Vec<NewValue<S>>),
    NamedTuple {
        name: S,
        elems: Vec<NewValue<S>>,
    },
    Struct {
        fields: FieldMap<S>,
    },
    NamedStruct {
        name: S,
        fields: FieldMap<S>,
    }
}

#[cfg(not(feature = "indexmap"))]
type NewMap<S> = std::collections::BTreeMap<NewValue<S>, NewValue<S>>;
#[cfg(feature = "indexmap")]
type NewMap<S> = indexmap::IndexMap<NewValue<S>, NewValue<S>>;

#[cfg(not(feature = "indexmap"))]
type FieldMap<S> = std::collections::BTreeMap<S, NewValue<S>>;
#[cfg(feature = "indexmap")]
type FieldMap<S> = indexmap::IndexMap<S, NewValue<S>>;

I think the biggest questions are:

  • how to represent Option, since the unwrap newtype variants feature includes Some and thus all of the following are allowed: None, Some(val), Some() = Some(()), Some(a, b) = Some((a, b)), and Some(a: a, b: b) = Some((a: a, b: b)). On a syntactic level, None is just a named ident and Some is just a named newtype / tuple / struct with special handling on serde's side
  • how to handle extensions - maybe we could support attributes more generally around every value subtree?
  • how to handle newtypes, since ron sometimes erases them

juntyr avatar Aug 20 '23 16:08 juntyr

I've been thinking about a more AST-like Value type again

Is there any progress made on this?

I'd like to have lossless Value, which I can have multiple attempts at trying to deserialize from for untagged enums. Trying to make something like this work:

#[derive(Debug, serde::Serialize, serde::Deserialize)]
enum A {
    A1(String),
    A2(String),
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
enum B {
    B1(String),
    B2(String),
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
#[serde(untagged)]
enum MyEnum {
    A(A),
    B(B),
}

let value = MyEnum::A(A::A1("Hello, world!".to_string()));

// valid RON serializations are:
// A1("")
// A2("")
// B1("")
// B2("")

// works fine with serde_json
let json = serde_json::to_string_pretty(&value).unwrap();
dbg!(serde_json::from_str::<MyEnum>(&json).unwrap());

// but doesn't work with ron
let ron = ron::ser::to_string_pretty(&value, ron::ser::PrettyConfig::default()).unwrap();
dbg!(ron::de::from_str::<MyEnum>(&ron).unwrap());

rlidwka avatar Dec 25 '23 11:12 rlidwka

Is there any progress made on this?

Not yet, unfortunately. Once I am back from parental leave, I will first tackle finally publishing v0.9, which includes a list of all the weird serde attribute cases that we currently cannot support because serde treats every format like JSON. If serde were to ever change that (and e.g. impl Trait in type aliases in traits makes that technically possible), an improved version of Ron’s Value type (as outlined above) would be used to fix most of these cases.

While it will take me a few more months to get back to this PR, feel free to implement something similar to my idea sketched out above (essentially an AST-like Value type that allows round trips of RON [without serde attributes] but is more an AST than a Value enum since e.g. named structs and enum variants are ambiguous in RON when we are not given type information by the user).

juntyr avatar Dec 25 '23 15:12 juntyr