quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Serializing CDATA

Open keithmss opened this issue 3 years ago • 11 comments

My apologies if this is an inappropriate place to raise this issue.

I have the following structure that I am trying to serialize:

use serde::Serialize;

/// `Description` field for `Tag`.
#[derive(Serialize)]
pub(super) struct Description {
    #[serde(rename = "$value")]
    value: String,
}

impl Description {
    /// Create a new `Description`.
    pub(super) fn new(input: &str) -> Self {
        let value = format!("<![CDATA[{}]]>", input);
        Self { value }
    }
}

As you can see I'm wrapping it in CDATA as per a file format that I'll be writing that I have no control over. This structure is actually just part of a larger structure:

#[derive(Serialize)]
#[serde(rename_all = "PascalCase")]
pub(crate) struct Tag {
    name: String,
    tag_type: TagType,
    data_type: DataType,
    dimensions: Option<usize>,
    radix: Radix,
    constant: Constant,
    external_access: ExternalAccess,
    description: Description,
}

impl Tag {
    /// Return an XML `String` representation of `Tag`.
    pub(crate) fn to_xml(&self) -> Result<String, DeError> {
        to_string(self)
    }
}

Which is serializing perfectly (except for the description field):

<Tag Name="test_dint" TagType="Base" DataType="DINT" Radix="Decimal" Constant="false" ExternalAccess="Read Only"><Description>&lt;![CDATA[Test DINT]]&gt;</Description></Tag>

It looks like when it serializes the CDATA portion is being escaped:

<Description>&lt;![CDATA[Test DINT]]&gt;</Description>

Is there anything I can do to unescape it ?

keithmss avatar Jan 26 '22 18:01 keithmss

Actually, the serializer worked as expected. You should not write <![CDATA[]]> manually as part of your data, because it is a part of format. I suppose that currently serializer never uses CDATA when serializes strings. That can be improved.

Because we are limited in a way to pass options from structure to the deserializer due to serde design, the possible solutions could be a setting in a serializer how to serialize strings:

  • text always
  • CDATA always
  • a function to select Text/CDATA representation
  • an automatic choice based on the content, as XmlBeans, for example, do. That choice could be implemented as a default format selection function from the previous point

PR is welcome!

Mingun avatar May 21 '22 18:05 Mingun

any update for this issue?

kulame avatar Sep 28 '22 10:09 kulame

I plan to make a PR which reworked serializer soon, but support for the CDATA I left for potential contributors. I left TODOs where corresponding code should be added.

Mingun avatar Sep 28 '22 11:09 Mingun

I'm willing to work on this if you can point me in the right direction.

keithmss avatar Dec 06 '22 16:12 keithmss

For this specific feature corresponding place where you should choose how to write string is https://github.com/tafia/quick-xml/blob/fb079b6714d7238d5180aaa098c5f9b02dbcc7da/src/se/content.rs#L64-L76

You can start from adding tests, I think, just after this mods in dedicated cdata mods: https://github.com/tafia/quick-xml/blob/fb079b6714d7238d5180aaa098c5f9b02dbcc7da/src/se/content.rs#L605-L606 https://github.com/tafia/quick-xml/blob/fb079b6714d7238d5180aaa098c5f9b02dbcc7da/src/se/content.rs#L794-L795

You are free to add additional tests if you think that they would be valuable.

After that you can start from adding an

pub enum TextFormat {
  Text,
  CData,
}

and a field of this type in serializer and writing Text or CDATA depending on the value of that format. If you wish, you could also add an Auto variant (and make it default). The algorithm when to choose one or another format can be spied something here.

Mingun avatar Dec 06 '22 17:12 Mingun

Has there been any progress on supporting CDATA? Does anyone know if there is a work-around to maybe custom serialize strings into CDATA using serde?

bartdorsey avatar May 04 '23 18:05 bartdorsey

@Mingun May be better define new special value $cdata, de works exact like $text but serialization always as CDATA?

Edit: i dig into code, so - answer no - it will be more complex

anton-dutov avatar Jul 04 '23 07:07 anton-dutov

How about a solution that allows raw strings without escaped characters? Then at least @keithmss' workaround would work.

jb-alvarado avatar Sep 02 '23 19:09 jb-alvarado