django-oembed icon indicating copy to clipboard operation
django-oembed copied to clipboard

URLs replaced in inappropriate contexts (i.e. inside <a href="">...)

Open carljm opened this issue 15 years ago • 1 comments
trafficstars

This is similar/related to #3, but it's a broader issue, not specific to Wikipedia.

There is no context-sensitivity in the replacement, so we've had cases where a link to a Flickr photo (that was intended to be just a link) got replaced with totally invalid HTML:

<a href="http://www.flickr.com/photos/gruber/4309828383">something</a>

gets turned into:

<a href="<img src="http://farm3.static.flickr.com/2690/4309828383_6cc07082f6_m.jpg" alt="Jobs Listens to Mossberg\'s Ideas About What\'s Wrong With the iPad"></img>">something</a>

I realize that given the way OEmbed uses regexes, this is a tough nut to crack in the general case. Is the only real solution to never run OEmbed on chunks of text that might already contain HTML?

Apart from the heavyweight options that don't seem realistic (parsing the text into a DOM tree and only running OEmbed on the cdata nodes?), one simple "80%" fix would be to enforce at least one character of white-space on either end of the URL. Technically a link could have href=" http://..." but that's pretty unlikely, so I think this would improve the situation quite a bit.

Would a working patch like that be considered, or is this just a case of "don't do that"?

carljm avatar Jan 28 '10 16:01 carljm

simple-as-possible candidate fix here: http://github.com/carljm/django-oembed/commit/a8a743b1db1305903b1acd8409a551cb557de75c

carljm avatar Jan 28 '10 17:01 carljm