fluidxml icon indicating copy to clipboard operation
fluidxml copied to clipboard

XML special chars break the document

Open EligiusSantori opened this issue 8 years ago • 10 comments

Hello. I got "Invalid Character Error" exception when i had tried to make XML document from db data because one of records contained "<" symbol. Then i have been required to check every xml node in my code for special chars and wrap it in CDATA block manually. I think the data should be escaped or CDATA-wrapped automatically by default. Thanks!

\FluidXml\fluidxml()->add('root', '12345')->xml(); // ok \FluidXml\fluidxml()->add('root', '<12345>')->xml(); // error

EligiusSantori avatar Jul 01 '16 15:07 EligiusSantori

I encountered a similar problem when trying to ->add() and the string contained the character &:

DOMElement::__construct(): unterminated entity reference ...

I was able get around it (kind of) by wrapping the string in htmlspecialchars().

I haven't tested it but it might be possible to use the XML Fragment method instead -- @EligiusSantori this might be easier than wrapping with CDATA blocks.

@daniele-orlando , if the XML Fragment works would it be out of scope to ask that fluidxml catches this error and automatically attempts to use the fragment method?

FoxxMD avatar Jul 08 '16 20:07 FoxxMD

Guys, the ampersand & like <, > and other characters must be escaped in XML. Nothing surprising here. You can use CDATA blocks (they exist for this) or convert the special characters to a valid XML representation. XML Fragments are a different thing, they are valid XML, not just a trick to escape invalid characters.

daniele-orlando avatar Jul 09 '16 21:07 daniele-orlando

I'll evaluate for the next release (2.0) to catch the exception and convert any special character to a valid XML representation.

daniele-orlando avatar Jul 09 '16 21:07 daniele-orlando

Hi, sorry about being offtopic but the usage of ->cdata() is not clear to me.

How would one go with adding CDATA to multiple elements using chaining methods?

        $xml
            ->add('property', true)
                ->add('id', $property->id)
                ->add('url')->cdata($property->url_web) // NOPE
                ->add('title', true)->cdata($property->title) // NOPE NOPE

Thanks in advance.

Phr33d0m avatar Jul 19 '16 11:07 Phr33d0m

If you want to stay fluent:

    $xml->add('property', true)
          ->add('id', $property->id)
          ->add('url', true)
            ->cdata($property->url_web)
          ->query('..')
          ->add('title', true)
            ->cdata($property->title);

Otherwise:

    $prop = $xml->add('property', true);
    $prop->add('id', $property->id);
    $prop->add('url', true)->cdata($property->url_web);
    $prop->add('title', true)->cdata($property->title);

daniele-orlando avatar Jul 19 '16 14:07 daniele-orlando

Hi, any update on this? Please escape characters correctly:

// Breaks
$fluid_xml->add([
    'Property' => [
        'ID' => 1234,
        'Test' => 'Hello & World'
    ]
]);

As I'm working from an array like this, I can't easily change this to use your ->cdata() method (the real code is more complicated).

This is quite a basic feature, I'm surprised it's missing. This potential bug could easily be missed and then break in production the first time a user types an & into your system somewhere.

andzandz avatar Jun 09 '17 13:06 andzandz

I don't see the point of not escaping by default. What use cases does not escaping cover?

shoe-diamente avatar Oct 26 '18 15:10 shoe-diamente

Not that what you laid out here will not work, but I think it about the features of the library. I found this library search for a way to convert an array to XML, which the library does well except for CDATA objects. I see this function...

public function addChild($child, ...$optionals) { return $this->chooseContext(function($cx) use ($child, &$optionals) { return $cx->addChild($child, ...$optionals); }); }

...and it seems there is no way to add a CDATA object adding an array to a node. It's not deal breaker, but it would be nice to have.

jedlynch avatar May 09 '19 19:05 jedlynch

So this is still an issue and doesn't seem like it will ever be fixed. I've forked it to fix it (messily)

on addChild for a simple string you can optionally use htmlentities() to encode your string

Usage

<?php

$options = [
	'htmlentities' => $args, // use to optimistically encode (always run value through htmlentities())
	'exceptionHtmlentities' => $args // use to encode and recreate element only if an exception with a message containing 'unterminated entity reference' is thrown
];

// only one option can be used at a time -- if you use htmlentities then exceptionHtmlentities will be ignored

$args = true; // use htmlentities() without args
$args = array(); // spread the array as arguments EX htmlentities($value, ...$args)

// use as defaults
$fluidDoc = new FluidXml(null, ['htmlentities' => $args]);
// use on addChild()
$fluidDoc->addChild('MyNode', 'someValue', ['htmlentities' => $args]);

//EX using as defaults, only on exception, with some htmlentities arguments
$fluidDoc = FluidXml(null, ['exceptionHtmlentities' => [(ENT_NOQUOTES | ENT_SUBSTITUTE)]]);

fork is at https://github.com/Fulfillment-dot-com/fluidxml/tree/specialchars

ADDITIONALLY: my branch moved composer.json to the root of the directory because cloning from git on windows may not respect symlinks and honestly the only reason it isn't already like this is because the author has some vim issue or something -- its a personal preference that's been foisted on anyone trying to develop on this repo.

So to use this fork change your composer.json like so:

{
  "require": {
    ...
    "servo/fluidxml": "dev-specialchars",
  },
  "repositories": [
    ...
    {
      "type": "git",
      "url": "https://github.com/Fulfillment-dot-com/fluidxml.git"
    }
  ],
}

P.S. It's a big bummer that so many members of FluidXml class are all private instead of protected. This could easily have been solved by extending FluidXml to use a modified FluidInsertionHandler but instead I had to do this whole fork just to manage it :/

FoxxMD avatar Aug 20 '20 18:08 FoxxMD

Use htmlspecialchars()

dev-guidolin avatar Aug 24 '20 00:08 dev-guidolin