wp-parsely icon indicating copy to clipboard operation
wp-parsely copied to clipboard

Use correct protocol for canonical URLs

Open GaryJones opened this issue 3 years ago • 8 comments

There is currently a Force HTTPS Canonicals setting, that changes the protocol for URLs in the structured metadata, independently of whether the home_url() uses https or not.

By default, this setting is set to false - that is, http is used - even when the site uses https in the home URL.

This has the effect that a WP instance at https://example.com will get URLs that start with http://example.com which therefore makes it incorrect as being the "canonical URL", since canonical URLs need to take into account the protocol as well.

A customer support ticket implies that the Yoast SEO plugin seems to want to use the correct protocol (as would make sense), but this needs confirming.

Existing customers will have their data indexed under http (unless they've already changed the setting), so implementing behaviour that checks the home_url() and adjusts the protocol used in canonical URLs, will mean having to do a re-index.

As such, any change in behaviour here should only be for new installs (check for a version number saved in the options, and only change the behaviour if the version is present and equal to or greater than the release in which the change is implemented).

Alternatively, this change is made for everyone, but all of the relevant sites will need a reindex (can that be notified / triggered remotely?). On a site with 400-500k URLs, this could take around a day.

When this new behaviour is implemented (canonical URL protocols match the home_url() protocol), then the Force HTTPS Canonicals setting can be hidden. It could even be considered deprecated and marked for removal in a future major release.

One additional thing would to then add an admin notice for when the home URL is changed on the Settings->General screen, to advise that a Parse.ly reindex is needed.

GaryJones avatar Mar 30 '21 12:03 GaryJones

@GaryJones - I am proposing another idea. What do you think?

This has the effect that a WP instance at https://example.com will get URLs that start with http://example.com which therefore makes it incorrect as being the "canonical URL", since canonical URLs need to take into account the protocol as well.

What's about if we set the correct option of Force HTTPS Canonicals setting for new installs? E.g:

  • If home_url() uses HTTPS, set Force HTTPS Canonicals setting to true.
  • Otherwise, set Force HTTPS Canonicals setting to false.

With that, we do not need to handle different implementations for different versions?

When this new behaviour is implemented (canonical URL protocols match the home_url() protocol), then the Force HTTPS Canonicals setting can be hidden. It could even be considered deprecated and marked for removal in a future major release.

One additional thing would to then add an admin notice for when the home URL is changed on the Settings->General screen, to advise that a Parse.ly reindex is needed.

  • If we notice a difference between home_url and Force HTTPS Canonicals setting, I think we advise users and display an admin notice to change the Force HTTPS Canonicals setting setting and contact Parse.ly support to re-run the index?
  • Otherwise, only on the settings page, we advise them not to change this option.

htdat avatar May 04 '21 08:05 htdat

As mentioned in the PR's description of https://github.com/Parsely/wp-parsely/pull/285, I've changed a bit the 2nd part of my suggestion above:

NOTE: there is a slight change for (2) comparing with this comment https://github.com/Parsely/wp-parsely/issues/174#issuecomment-831762629 as I do not want to let all existing users contact Parse.ly support rep to run re-indexing. That might create unnecessary requests for Parse.ly support team.

htdat avatar May 05 '21 09:05 htdat

Hi team, we have recently had some issues with a large publisher migrating to the plugin and having their old canonicals mismatch as a result of choosing the wrong option on forcing HTTPS canonicals. Can we update the language to mention that if you are migrating to the plugin from another CMS to make sure that your canonicals will match your old canonicals or else you will have split records in the Parsely dashboard.

evantwidwell avatar Nov 18 '21 14:11 evantwidwell

Can we update the language to mention that if you are migrating to the plugin from another CMS to make sure that your canonicals will match your old canonicals or else you will have split records in the Parsely dashboard.

Migrations from another CMS seems like a rare case when compared to the regular case of someone just adding this plugin.

In another issue elsewhere here, we've suggested adding in an Onboarding wizard - that seems like a better place to have a reminder as you've suggested.

GaryJones avatar Nov 25 '21 11:11 GaryJones

Migrations from another CMS seems like a rare case when compared to the regular case of someone just adding this plugin.

This [migrations from other CMSes] is what happens when new customers join WPVIP, no?

danielabloch avatar Dec 09 '22 23:12 danielabloch

Migrations from another CMS seems like a rare case when compared to the regular case of someone just adding this plugin.

This [migrations from other CMSes] is what happens when new customers join WPVIP, no?

@danielabloch: Not necessarily, as they may be migrating from a site that is already WordPress. However, if the setting in the new environment results in canonical URLs that do not match the previous canonical URLs, this problem will appear.

So it's not actually so much about migration from a CMS to another, but rather keeping the canonical URLs identical through the plugin setting while moving through CMSs or hosting providers.

acicovic avatar Dec 13 '22 09:12 acicovic

Need to confirm if this is still an issue.

mjangda avatar Feb 27 '23 15:02 mjangda

@arhine, @thompsonjoshua, @LauraKalnicky, any feedback on this? Are things as they should, or is there something to improve here?

acicovic avatar Mar 30 '23 08:03 acicovic

@acicovic Picking up this discussion again, it appears that new installs still have Force HTTPS Canonicals set to False.

Can you confirm that is correct? And is there anything preventing us from setting that to True for new installs?

arhine avatar Oct 17 '24 13:10 arhine

Hello, I can confirm that per our default settings code, force_https_canonicals is set to false:

https://github.com/Parsely/wp-parsely/blob/b83be77db84dba709808a306bc67b77e6556f2b5/src/class-parsely.php#L112

And no, nothing stops us to set this to true for new installs, or use the home_url() function to set the correct protocol for new installs.

acicovic avatar Oct 17 '24 14:10 acicovic