external-media-without-import icon indicating copy to clipboard operation
external-media-without-import copied to clipboard

URLs containing `&` don't work

Open pjeby opened this issue 4 years ago • 8 comments

Because of the way post GUIDs are escaped by WordPress, an image URL like https://images.unsplash.com/photo-1491895200222-0fc4a4c35e18?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2767&q=80 gets turned into https://images.unsplash.com/photo-1491895200222-0fc4a4c35e18?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2767&q=80 in the GUID, which leads to the wrong image URL being used by the media display and other functions (e.g. Elementor).

For my own purposes, I'm working around this with:

add_filter('wp_get_attachment_url', 'html_entity_decode', 99999, 1);

But a more robust version might look like:

add_filter('wp_get_attachment_url', function($url, $id) {
    if ( $url === get_the_guid($id) ) return html_entity_decode($url);
    return $url;
}, 99999, 2);

Along with changing the current get_attached_file filter to decode the guid before returning it.

pjeby avatar Aug 15 '19 16:08 pjeby

I just tried the image URL you provided. Indeed '&' is encoded as '&' when the URL is saved as GUID in database. But seems that the image can still be correctly displayed when the media is inserted in post, even with Elementor.

Can you provide some more detailed steps for me to reproduce the issue?

zzxiang avatar Aug 17 '19 13:08 zzxiang

That's odd - try this URL instead: https://pbs.twimg.com/media/D_NKa3yWkAYwZwn?name=900x900&format=png -- I just noticed that for some reason the unsplash one is now working without the filter, but this one is not. I wonder if unsplash deliberately changed their URL handling to support miscoded URLs?

pjeby avatar Aug 17 '19 13:08 pjeby

You are right. The URL from pbs.twimg.com does not work. It is possible that the web server of the external image deliberately handles encoded URLs. WordPress plugins should not rely on that.

zzxiang avatar Aug 17 '19 13:08 zzxiang

Also, I just figured out why the unsplash one is "working" -- it is simply ignoring all the parameters that begin with amp; and displaying the default full-size image. So it only appears to work.

pjeby avatar Aug 17 '19 14:08 pjeby

After some investigation, I found that & is not converted to & by htmlentities, i.e. the opposite of html_entity_decode. Instead, it is converted by wp_filter_kses which is a WordPress function acting as a pre_post_guid filter to sanitize post GUIDs on both save and display.

wp_filter_kses eventually calls wp_kses_normalize_entities, which converts AT&T to AT&T, : to :, &#XYZZY; to &#XYZZY; and so on.

Therefore, I think using html_entity_decode to fix the issue might introduce other problems, as it is not the exact opposite of wp_kses_normalize_entities.

The post GUID sanitization was introduced in a commit in 2011. It is so long ago that seems that no one knows the reason why it is needed, even if I've asked the WordPress core team in their Slack channel.

I think a better solution might be to remove the filter before external media links are added and resume it after the links are added.

But that means the links already added before the fix are applied need to be removed and added again.

zzxiang avatar Sep 02 '19 09:09 zzxiang

There is a reason -- sort of -- that it's needed, or at least that gets it used. Post GUIDs are output in the RSS feeds without any escaping, but the & signs need to be escaped for proper XML. So, it is actually correct, in that sense, sort of, even though really the escaping should be done by the RSS template, not in the database. (But it's way too late to fix that in WP, of course.)

Rather than remove the filter and replace it, though, it may be simpler to just temporarily add a high-priority filter that returns the desired value, since then it doesn't matter if another plugin is messing with those filters, too.

(But that's just me thinking out loud because I need to change Imposer and Postmark to deal with this issue as well. Right now I haven't used any GUIDs with & in them for those tools, but it will become a major issue for anyone who does.)

pjeby avatar Sep 02 '19 15:09 pjeby

I see.

I did some experiment and inspected the core source code, and found that in fact WordPress core does convert & to & while exporting RSS2 feed, even if I changed the & back to & via MySQL client. The conversion is done by function wptexturize in wp-includes/formatting.php. The function is added as the default the_content filter to be called on RSS2 export. So seems that removing the wp_kses_normalize_entities filter will be fine.

But I haven't tried other feed formats yet. And I think your consideration makes sense. It is safer not to remove the filter added by core.

As far as I know, if the URL of an external media contains & and the media is added by External Media without Import, the only places where the media cannot be correctly displayed are:

  • the Media -> Library page which only the site administrator can access, and
  • when you are editing a post and insert the media into the post editor. (However, the media can be correctly displayed in the editor if it is inserted into a classic paragraph block.)

I haven't found other places where the media can't be correctly displayed.

Yes, the media can be correctly displayed when the post is published and viewed by non-admin users.

I shall try to find a solution which does not remove the default filters and makes the media correctly displayed in the Media Library page and the post editor page.

zzxiang avatar Sep 22 '19 04:09 zzxiang

The place I discovered it didn't work was as a background image in Elementor, as entities are not valid in CSS, and the URL is used in Elementor-generated CSS.

For Elementor I've ended up using a different solution for inserting external images, and I now use this bit of code in Imposer and Postmark to detect altered GUIDs and force their db content.

So if you decide not to support such use cases, it won't actually affect me at this point, as I ended up using my "paste URL in the search box" hack as my way of using external images with Elementor, rather than this plugin. Thanks for investigating anyway, and for inspiring said URL-paste hack, which I wouldn't have known how to write without studying your code.

pjeby avatar Sep 22 '19 05:09 pjeby