quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

How to ignore double quotation mark when do serialization?

Open ewenliu opened this issue 3 years ago • 8 comments

Before serialization:

r#""349DC700140D7F86A0784842780****""

After serialization:

&quot349DC700140D7F86A0784842780****&quot

However, What expect: "349DC700140D7F86A0784842780****"

How to ignore serializing double quotation mark?

ewenliu avatar Feb 18 '22 10:02 ewenliu

That is impossible now. In my fork, https://github.com/Mingun/fast-xml, I'll plan to implement API for set desired level of quoting:

enum QuoteLevel {
  /// - `<` -> `&lt;`
  /// - `>` -> `&gt;`
  /// - `&` -> `&amp;`
  /// - `"` -> `&quot;`
  /// - `'` -> `&apos;`
  Full,
  /// - `<` -> `&lt;`
  /// - `>` -> `&gt;`
  /// - `&` -> `&amp;`
  Partial,
  /// - `<` -> `&lt;`
  /// - `&` -> `&amp;`
  Minimal,
}
let mut s = Serializer::new(...);
s.set_quote_level(QuoteLevel::Minimal);
...

Mingun avatar May 07 '22 18:05 Mingun

This requires a bit more work than I expect initially. BytesText and BytesStart both contains already escaped content, so to implement proposed API for serializer and at the same time to avoid unnecessary decode-encode roundtrip it is necessary to rework an API.

I think, it is needed to split Event into reader::Event and writer::Event:

  • reader::Event will store escaped content, as presented in a parsed stream. Current Event could be converted to reader::Event with minimal changes. Probably, storing Cow inside will no longer be needed
  • writer::Event will store unescaped content and therefore could be writted with desired level of escaping

Mingun avatar Jun 05 '22 14:06 Mingun

This is currently possible to do with the raw, non-serde API.

use quick_xml::{Writer, events::Event, events::BytesDecl, events::BytesText, events::BytesStart, escape::partial_escape};
use std::fs::File;
use std::io::Cursor;
use std::io::Write;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut writer = Writer::new_with_indent(Cursor::new(Vec::new()), b' ', 2);

    writer.create_element("test")
          .write_text_content(BytesText::from_escaped(partial_escape(r#""349DC700140D7F86A0784842780****""#.as_bytes())))?;

    print!("{}", std::str::from_utf8(&writer.into_inner().into_inner())?);

    Ok(())
}

    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/playground`
<test>"349DC700140D7F86A0784842780****"</test>

https://docs.rs/quick-xml/latest/quick_xml/escape/fn.partial_escape.html

dralley avatar Jun 05 '22 15:06 dralley

Also this issue seems like a duplicate of https://github.com/tafia/quick-xml/issues/350

dralley avatar Jun 05 '22 16:06 dralley

Yeah, initially I thought, that I probably could provide a function for #[serde(serialize_with)], but it seems unnecessary to introduce it at quick-xml level. I doubt that someone will needed in different rules when serializing different fields. I'll close #350 as duplicate, because here a more info

Mingun avatar Jun 05 '22 17:06 Mingun

Yeah, initially I thought, that I probably could provide a function for #[serde(serialize_with)], but it seems unnecessary to introduce it at quick-xml level.

Any updates on this? I met the same problem here (Yes, aws s3 again).

I tried to use serialize_with with partial_escape but it doesn't works:

#[derive(Default, Debug, Serialize)]
#[serde(default, rename_all = "PascalCase")]
struct CompleteMultipartUploadRequestPart {
    #[serde(rename = "$unflatten=PartNumber")]
    part_number: usize,
    #[serde(rename = "$unflatten=ETag", serialize_with = "partial_escape")]
    etag: String,
}

fn partial_escape<S>(s: &str, ser: S) -> std::result::Result<S::Ok, S::Error>
where
    S: serde::Serializer,
{
    ser.serialize_str(&String::from_utf8_lossy(
        &quick_xml::escape::partial_escape(s.as_bytes()),
    ))
}

partial_escape produces the expected results. But after serialize, we still got escape content. Seems quick-xml does escape at another level.

Xuanwo avatar Aug 25 '22 09:08 Xuanwo

As a quick fix I could suggest to change quick-xml locally and build with the patched version. You should make changes here: https://github.com/tafia/quick-xml/blob/be8138f26c4a98251813cd05ef45abef9531a589/src/se/mod.rs#L98-L111

-            BytesText::new(&value)
+            BytesText::from_escaped(&crate::escape::partial_escape(&value))

Or you could implement a QuoteLevel settings as was suggested above. If you make that quickly maybe it could be even included in 0.24.0 which I'll plan release this weekend. But we need tests for that, probably the tests is the most time consuming thing here.

Mingun avatar Aug 25 '22 10:08 Mingun

I should say, that quick fix, suggested above, will not work correctly in all cases. This code writes also attribute values and avoiding escape " will produce incorrect results if attribute value will contain ". Good news that I've almost done new serializer which will have the mentioned setting

Mingun avatar Sep 06 '22 18:09 Mingun