toml 0.6.0 no longer allows deserializing a struct with an explicit lifetime
I've raised https://github.com/toml-rs/toml/pull/489 as a minimal repro within the existing test suite
This was an intentional change as by using toml_edit for the parser as everything gets converted to an owned String before we deserialize, making it so borrows of the original source cannot happen.
This was an intentional change as by using
toml_editfor the parser as everything gets converted to an ownedStringbefore we deserialize, making it so borrows of the original source cannot happen.
Got it. Is there a documentation change that can be made to make that explicit? I looked through the changelog but nothing stood out as the cause for this which is why I ended up raising this issue.
The changelog was updated.
So why does toml_edit convert everything to String?
toml_edit parses directly into a Document.
In theory, we could add an internal ParsedDocument that gets converted into a Document or used for serde but that is a lot of extra code to maintain and a lot of extra steps and we need a strong enough case to justify it.
In theory, we could also parse into a Document<'de> where we store Cows for the strings but I've generally found putting Cow into API items adds a lot of end-user complexity and we need a strong enough case to justify it.
As an alternative to Cow, we could also parse into enum { Span, String }, like we do with Repr and Decor but that significantly complicated the API and for toml_edit users will require a lot of unnecessary unwraps, so this would also need a strong case to justify it.
It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations. Offering a Document<'de> with Cows seems pretty reasonable -- in my experience this doesn't really need to add a lot of end-user complexity.
It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations.
Do you have a use case where the benefits of zero-copy / low allocation deserialization is important? Assuming its performance driven, what is the actual performance before/after this and what is your performance target and why is that performance target set?
I don't currently have such a use case. My usage of toml that prompted these questions was for Askama's configuration file.
For what is worth, I'd like to point out that serde_json's from_str just errors out on runtime if it turns out that the struct can't hold owned data because that lifetime is used in a reference like &'a str. serde_json is usually considered an example for serde deserializer and serializer design, although this potential runtime failure, which happens when serde_json unescapes strings, is also considered a footgun for the not-so-experienced.
The difference between serde_json and toml is that serde_json has a chance to reuse the borrowed data. This distinction is highlighted in the documentation on Deserializer lifetimes
I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.
I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.
What is your use case that you are avoiding allocations?
@Yuri6037 please note that due to what was stated on #505 and https://github.com/toml-rs/toml/issues/490#issuecomment-1411141895 no-allocation deserialization was not possible with the toml crate anyway, as deserializing e.g. a Cow field would always result in a Cow::Owned. ~~In other words, previous toml crate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.~~
Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the DeserializeOwned constraint is relaxed.
I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.
What is your use case that you are avoiding allocations?
I'm trying to improve performance in one of my app which already does far too many allocations.
@Yuri6037 please note that due to what was stated on #505 and #490 (comment) no-allocation deserialization was not possible with the
tomlcrate anyway, as deserializing e.g. aCowfield would always result in aCow::Owned. ~In other words, previoustomlcrate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.~Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the
DeserializeOwnedconstraint is relaxed.
I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.
I'm trying to improve performance in one of my app which already does far too many allocations.
Could you expand on your use case where TOML string allocations are a performance concern in your allocation?
For context on my perspective, I deal with Cargo's performance which for certain projects can load hundreds of TOML files. Even in that situation, TOML is a fraction of a fraction of where the end-user is going and only that is noticeable in no-op runs. The only reason I'm caring about no-op run performance is for shell completions calling into cargo. Like with Cargo, TOML is mostly dealt with in application initialization and most other applications load a fraction of the TOML files.
I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.
Unless you are intentionally restricting your string inputs, this will reject perfectly valid TOML files because any escaped values will need an allocation.
@epage as for a use case.
In my case I am deserializing an id. A type I also use in another type for json deserialization. Being able to work without copy there is very important (read millions of document ids parsed).
Thus the type is already DocumentId<'a> .. I wanted to also use that in toml, but cannot anymore.
My type does support DocumentId<'static> which is the 'owned' variant, but toml doesnt allow that either. Also the configuration should be usable from toml or json. Now the struct differs between the two, or i need to put a trait around it to convert.
In any case it was surprising to see
PS: I basically wrapped the structure now and build it internally to convert from String to my DocumentId
main has moved toml off of toml_edit to toml_parse and can now borrow data, see #948.
Looks like I messed up in two ways. The question is whether fixing it at this point is a breaking change. I would suspect so.
Keys delegate to their type
https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/key.rs#L32
https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/key.rs#L8
https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/parser/devalue.rs#L15
The type delegates to CowStrDeserializer
https://docs.rs/serde/latest/serde/de/trait.IntoDeserializer.html#impl-IntoDeserializer%3C'de,+E%3E-for-Cow%3C'a,+str%3E
which is implemented to only call visit_str
https://docs.rs/serde/latest/src/serde/de/value.rs.html#799-807
Values we handle directly and only call visit_str
https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/value.rs#L93-L94