gofeed icon indicating copy to clipboard operation
gofeed copied to clipboard

Fails parsing cdata markup that is followed by escaped markup

Open dy-dx opened this issue 6 years ago • 4 comments

When parsing this valid xml:

<rss version="2.0">
  <channel>
    <item>
      <description>
        <![CDATA[<a src="http://foo.com?foo&bar=baz"></a>]]>&amp;
      </description>
    </item>
  </channel>
</rss>

gofeed fails with the error message:

unknown predefined entity &bar=baz"></a>]]>&amp;

dy-dx avatar Feb 01 '19 20:02 dy-dx

I can confirm that this is not a problem of encoding/xml which has no problems decoding this input. See https://play.golang.org/p/wWJicjEa-iv

ghost avatar Feb 01 '19 21:02 ghost

@dy-dx @lutzhorn thank you for the report and confirmation.

CDATA parsing is currently a hack and needs to be rewritten. We aren't using encoding/xml's Unmarshal because it wasn't flexible enough for gofeed's requirements, so we don't get it for free.

I'll try to take a look at CDATA handling soon. Perhaps we can pull some code from encoding/xml itself.

mmcdole avatar Feb 02 '19 04:02 mmcdole

I have opened a PR to address this issue https://github.com/mmcdole/gofeed/pull/120 PTAL @mmcdole

OrKoN avatar Apr 06 '19 18:04 OrKoN

This issue is fixed on the latest master. Here's the commit - https://github.com/mmcdole/gofeed/commit/22a67f9156f2a9c28d04dc012f5d24e1d7f2c49b

sudhanshuraheja avatar Jun 16 '20 18:06 sudhanshuraheja