pluto icon indicating copy to clipboard operation
pluto copied to clipboard

Wrong/missing encoding of ampersands in atom/rss feeds

Open unriccio opened this issue 7 years ago • 4 comments

Hi, I got an "XML Parsing Error: not well-formed" using the pluto-generated feeds available on http://blogs.openstreetmap.org/ because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links/guids.

I don't know the details of that instance (version/release, environment, etc), but one of the developers suggested it should be a pluto issue. Could you please check? Test case and details on gravitystorm/blogs.osm.org#28 (I checked past/closed issues about this but I couldn't find any)

Thanks.

unriccio avatar May 25 '17 16:05 unriccio

@unriccio Thanks for reporting. Will look into the encoding of ampersands issue. Cheers.

geraldb avatar May 25 '17 17:05 geraldb

I'm adding your original ticket / issue over here for easier reference:

I got some kind of "XML Parsing Error: not well-formed" using my favourite feedreader. I see the same error also when opening the atom/rss feeds with firefox and chromium.

It seems it's because of a wrong re-encoding (or missing encoding) of the ampersand symbol within links and guid. Example follows: (sorry it's a spam entry but I think the issue still applies)

Original feed:

riccio@hactar:/tmp$ wget -q http://www.openstreetmap.org/diary/rss -O - | fgrep Beer | egrep "(link|guid)"
      <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&amp;%20Beer%20Garden/diary/41379</link>
      <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&amp;%20Beer%20Garden/diary/41379</guid>

Aggregator output:

riccio@hactar:/tmp$ wget -q https://blogs.openstreetmap.org/rss20.xml -O - | fgrep Beer | egrep "(link|guid)"
  <guid>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</guid>
  <link>http://www.openstreetmap.org/user/Harvest%20Moon%20Yakitori%20&%20Beer%20Garden/diary/41379</link>

Note: So %20&amp;%20 get changed to %20&%20 in the guid and link tag - is this correct? Why does it break the xml parsing? Needs to get checked.

geraldb avatar Jan 26 '20 15:01 geraldb

Yep, correct.

The ampersand is used to escape entities (as indeed shown by "&"), so the parser will try to interpret "&%20" as if it was a proper entity.

unriccio avatar Feb 03 '20 00:02 unriccio

Sorry for the long wait. I finally got around to check in detail. The error is in the feed templates (in the openstreetmap) repo that are missing xml escapes (CGI::escape_HTML) for guid and link that turns "unescaped" & back into escaped &amp;. I will try to send in a pull request later today and than close this ticket. Again thanks for reporting the error. Keep it up. Cheers. Prosit 2020!

geraldb avatar Feb 03 '20 11:02 geraldb