toml icon indicating copy to clipboard operation
toml copied to clipboard

Uneeded escapes with multiline string character escapes

Open nathaniel-daniel opened this issue 7 months ago • 7 comments

Hello, I'm trying to write and load a string with newlines and tabs with toml, and I'm seeing some strange behavior with CR and HT, though this might be the intended behavior.

Here's a small reproduction:

// [dependencies]
// serde = { version = "1.0.204", features = ["derive"] }
// toml = "0.8.14"

#[derive(Debug, serde::Serialize, serde::Deserialize)]
struct Simple {
    message: String,
}

fn main() {
    let simple = Simple {
        message: "\tHello\r\nWorld!".into(),
    };
    let output = toml::to_string(&simple).unwrap();
    println!("{output}");
    let round_tripped: Simple = toml::from_str(&output).unwrap();

    assert!(simple.message == round_tripped.message);
}

I would expect (or prefer) the output to be:

message = """
	Hello
World!"""

Instead, the output is:

message = """
\tHello\r
World!"""

CR and HT are escaped.

According to the toml spec, for multiline strings:

Any Unicode character may be used except those that must be escaped: backslash and the control characters other than tab, line feed, and carriage return (U+0000 to U+0008, U+000B, U+000C, U+000E to U+001F, U+007F)

It looks like this library is escaping some characters that don't necessarily need to be escaped. However, I request that at least CR be written unescaped, as the library is currently splitting CRLFs. This makes interaction with any line ending normalizers, like git, very messy.

I would also prefer tabs to be written unescaped as well. I am using tabs to format the interior content of a multiline string, and the formatting is lost (at least visually) when passed through this library.

nathaniel-daniel avatar Jul 15 '24 09:07 nathaniel-daniel