toml icon indicating copy to clipboard operation
toml copied to clipboard

toml 0.6.0 no longer allows deserializing a struct with an explicit lifetime

Open msfjarvis opened this issue 2 years ago • 19 comments

I've raised https://github.com/toml-rs/toml/pull/489 as a minimal repro within the existing test suite

msfjarvis avatar Jan 24 '23 05:01 msfjarvis

This was an intentional change as by using toml_edit for the parser as everything gets converted to an owned String before we deserialize, making it so borrows of the original source cannot happen.

epage avatar Jan 24 '23 13:01 epage

This was an intentional change as by using toml_edit for the parser as everything gets converted to an owned String before we deserialize, making it so borrows of the original source cannot happen.

Got it. Is there a documentation change that can be made to make that explicit? I looked through the changelog but nothing stood out as the cause for this which is why I ended up raising this issue.

msfjarvis avatar Jan 24 '23 13:01 msfjarvis

The changelog was updated.

epage avatar Jan 24 '23 13:01 epage

So why does toml_edit convert everything to String?

djc avatar Jan 24 '23 14:01 djc

toml_edit parses directly into a Document.

In theory, we could add an internal ParsedDocument that gets converted into a Document or used for serde but that is a lot of extra code to maintain and a lot of extra steps and we need a strong enough case to justify it.

In theory, we could also parse into a Document<'de> where we store Cows for the strings but I've generally found putting Cow into API items adds a lot of end-user complexity and we need a strong enough case to justify it.

As an alternative to Cow, we could also parse into enum { Span, String }, like we do with Repr and Decor but that significantly complicated the API and for toml_edit users will require a lot of unnecessary unwraps, so this would also need a strong case to justify it.

epage avatar Jan 24 '23 14:01 epage

It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations. Offering a Document<'de> with Cows seems pretty reasonable -- in my experience this doesn't really need to add a lot of end-user complexity.

djc avatar Jan 26 '23 10:01 djc

It seems quite surprising to me to just throw away the ability to zero-copy/low-allocation deserialization when that is one of the main benefits usually offered by serde integrations.

Do you have a use case where the benefits of zero-copy / low allocation deserialization is important? Assuming its performance driven, what is the actual performance before/after this and what is your performance target and why is that performance target set?

epage avatar Jan 26 '23 13:01 epage

I don't currently have such a use case. My usage of toml that prompted these questions was for Askama's configuration file.

djc avatar Jan 26 '23 13:01 djc

For what is worth, I'd like to point out that serde_json's from_str just errors out on runtime if it turns out that the struct can't hold owned data because that lifetime is used in a reference like &'a str. serde_json is usually considered an example for serde deserializer and serializer design, although this potential runtime failure, which happens when serde_json unescapes strings, is also considered a footgun for the not-so-experienced.

AlexTMjugador avatar Jan 31 '23 21:01 AlexTMjugador

The difference between serde_json and toml is that serde_json has a chance to reuse the borrowed data. This distinction is highlighted in the documentation on Deserializer lifetimes

epage avatar Jan 31 '23 22:01 epage

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

Yuri6037 avatar Sep 18 '24 21:09 Yuri6037

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

What is your use case that you are avoiding allocations?

epage avatar Sep 18 '24 21:09 epage

@Yuri6037 please note that due to what was stated on #505 and https://github.com/toml-rs/toml/issues/490#issuecomment-1411141895 no-allocation deserialization was not possible with the toml crate anyway, as deserializing e.g. a Cow field would always result in a Cow::Owned. ~~In other words, previous toml crate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.~~

Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the DeserializeOwned constraint is relaxed.

AlexTMjugador avatar Sep 18 '24 21:09 AlexTMjugador

I just had to revert one of my projects to 0.5.11 to be able to deserialize without allocations.

What is your use case that you are avoiding allocations?

I'm trying to improve performance in one of my app which already does far too many allocations.

Yuri6037 avatar Sep 18 '24 22:09 Yuri6037

@Yuri6037 please note that due to what was stated on #505 and #490 (comment) no-allocation deserialization was not possible with the toml crate anyway, as deserializing e.g. a Cow field would always result in a Cow::Owned. ~In other words, previous toml crate versions at best provided some placebo that zero-copy deserialization was happening when in fact it was not.~

Edit: I reviewed the old deserialization code for a bit and indeed it seems it had some logic for handling borrowed and owned strings separately, so it turns out that in previous versions zero-copy string deserialization may indeed have been possible. But with the current design it's indeed no longer possible, even if the DeserializeOwned constraint is relaxed.

I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.

Yuri6037 avatar Sep 18 '24 22:09 Yuri6037

I'm trying to improve performance in one of my app which already does far too many allocations.

Could you expand on your use case where TOML string allocations are a performance concern in your allocation?

For context on my perspective, I deal with Cargo's performance which for certain projects can load hundreds of TOML files. Even in that situation, TOML is a fraction of a fraction of where the end-user is going and only that is noticeable in no-op runs. The only reason I'm caring about no-op run performance is for shell completions calling into cargo. Like with Cargo, TOML is mostly dealt with in application initialization and most other applications load a fraction of the TOML files.

I'm not using a Cow, I use from_str with a &str and all models use &'a based fields.

Unless you are intentionally restricting your string inputs, this will reject perfectly valid TOML files because any escaped values will need an allocation.

epage avatar Sep 18 '24 22:09 epage

@epage as for a use case.

In my case I am deserializing an id. A type I also use in another type for json deserialization. Being able to work without copy there is very important (read millions of document ids parsed).

Thus the type is already DocumentId<'a> .. I wanted to also use that in toml, but cannot anymore.

My type does support DocumentId<'static> which is the 'owned' variant, but toml doesnt allow that either. Also the configuration should be usable from toml or json. Now the struct differs between the two, or i need to put a trait around it to convert.

In any case it was surprising to see

PS: I basically wrapped the structure now and build it internally to convert from String to my DocumentId

matthiasg avatar Dec 12 '24 21:12 matthiasg

main has moved toml off of toml_edit to toml_parse and can now borrow data, see #948.

epage avatar Jun 24 '25 21:06 epage

Looks like I messed up in two ways. The question is whether fixing it at this point is a breaking change. I would suspect so.

Keys delegate to their type https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/key.rs#L32 https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/key.rs#L8 https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/parser/devalue.rs#L15 The type delegates to CowStrDeserializer https://docs.rs/serde/latest/serde/de/trait.IntoDeserializer.html#impl-IntoDeserializer%3C'de,+E%3E-for-Cow%3C'a,+str%3E which is implemented to only call visit_str https://docs.rs/serde/latest/src/serde/de/value.rs.html#799-807

Values we handle directly and only call visit_str https://github.com/toml-rs/toml/blob/9154dcb3b2eea8a84db183806411adf081bc0977/crates/toml/src/de/deserializer/value.rs#L93-L94

epage avatar Sep 09 '25 19:09 epage