solidus_starter_frontend icon indicating copy to clipboard operation
solidus_starter_frontend copied to clipboard

rethink canonical links

Open fthobe opened this issue 11 months ago • 7 comments

[!CAUTION] Blocked by #417

Brief Definition of what Canonical does:

Canonical tells a search engine which page is important of a collection of similar or identical pages.

What google says:

Canonicalization is the process of selecting the representative –canonical– URL of a piece of content. Consequently, a canonical URL is the URL of a page that Google chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.

The effective outcome is that if I don't set it correctly none of my pages are considered really relevant as they all share the relevance among each others and search engines usually penalize all of them.

What's the issue

Current canonical tag setup

Any product is reachable under multiple urls:

  1. https://example.com/products/{slug}
  2. https://example.com/products/{ID}
  3. https://example.com/products/{historyslug} (if present)

One URL is current, all the others are history, non search friendly or only kept for legacy reasons (well running off site links). 1 could rank well, but needs to share visibility with 2 (that doesn't rank well because it doesn't have keywords in the slug) and 3 (which doesn't receive proper internal linking as all linking goes to the current slug). Nevertheless the slug is set on 1 which is the intended canonical url. 2 Should be forwarded or return a 404 depending on how you want to see it and 3 should 301 redirect to 1 to avoid loosing previously created backlinks from other websites.

The commit message here contains

Generates a simple canonical tag based on the request path,…

which is exactly the opposite of what canonical tags are made for (indicate the url of primary html page instead of the request path to explain to search engines which page is dominant in a collection of pages that are a derivative of the primary one to avoid duplication of content).

What should be done?

Throw out all current canonical logic and reduce the canonical

A sane default would be that canonical renders always the correct current {storeurl}/{language}/{resourceroute}/{ressource-slug}. So globalize should probably override something here in case of translation.

What should also be done?

We have mitigated the problem through #413 redirecting friendlyID (which you should approve:) history urls and IDs (as in example 2) to the current slug. So while the construction of the canonical is still not that great, it is mitigated. We are working on having the same thing working also on taxons and in content / blog pages.

Solidus Version: Any

To Reproduce Create a product and navigate to that product via

  1. https://example.com/products/{slug}
  2. https://example.com/products/{ID}
  3. https://example.com/products/{historyslug} (if present)

Current behavior All links return distinct canonical links despite being the same resource.

Expected behavior 2 and 3 have 301 redirects to 1 and 1 has a canonical link identical to the slug configured.

fthobe avatar Feb 04 '25 19:02 fthobe

@benjaminwil https://github.com/solidusio/solidus_starter_frontend/pull/413 fixes the problem partially. But we should think about how to holistically fix canonicals.

Also the trailing slash should be discussed. I am not convinced that the entire dimension of the issue for SEO has been considered. The reality is that the decision that was taken here was very oppinionated:

  • not everybody comes from a platform where trailing slashes had been enforced requiring a redirect for every taxon slug that doesn't have one
  • you overide admin back end decisions: if I don't set a trailing slash I shouldn't be (without notice) be forced to set one in a hardly visible canonical tag, I think this goes beyond the scope of a starter frontend (representation and not configuration)
  • it can be set serverwide at other places

fthobe avatar Feb 04 '25 19:02 fthobe

Hey @fthobe, this seems like a feature request that you have tagged as a bug report. The behaviour you are describing in this ticket has never been part of the functionality provided by solidus_starter_frontend, so I am not sure we can consider this a bug. You have outlined your desired behaviour of the system, which may or may not match what the intention behind what the framework provides. I think in the future it would also be helpful to keep issues focused to a single item so we can respond to and address appropriately, rather than outlining what looks like a number of feature request and tagging that as a bug.

As a reminder, the idea behind this repo is to provide you with a starting point, which you can customize and tailor to your needs, and I think some of the behaviour you are looking for here, may just be something that should live in your application, rather than the template provided here.

The main issue you seem to have is with the routing in the starter frontend which allows for accessing products through different URLs. I don't see this being a bug as the starter front-end does not link to any of the product routes by ID or legacy slug. You also mentioned that this has been changed in an open PR so that part of the issue will be resolved as you mentioned.

I think a reasonable feature request may be to generate canonical tags pointing to the current slug when a historical one is used, but that is not a bug, rather an enhancement, since that behaviour has never existed.

Lastly the trailing slash handling is left to you to customize if you are not satisfied with what is provided. We've recently removed the dependency on the canonical_tag helper (part of https://github.com/jumph4x/canonical-rails gem) in favour of a helper method which will be copied into your application where you can fully customize and extend the behaviour.

Lastly, as someone who contributed to this work, I feel mildly offended by your suggestion of throwing the current implementation out, so if you want to propose a different approach feel free to open a PR with the changes you want, rather than trying to suggest the existing functionality is a bug, when it doesn't meet your expectations.

forkata avatar Feb 05 '25 01:02 forkata

Hey @forkata

I am really sorry for offending you, honestly.

The implementation is just wrong and does not respect what a canonical is meant for (giving an authoritative url for similar or identical content). I am not aware how it was done before and while I believe any reduction of abstraction levels (removing gems) is a good thing, I feel an opportunity was missed to define sane defaults.

It is never the less true that in my (personal) oppinion, significant aspects SEO have not been respected in this repo and it is sometimes hard for me to understand how certain changes, surely with best intentions in mind, create implementations of seo relevant data that is simply not reflecting the state of the industry. I had that discussion with all three maintainers from blish / Nebulab / sg and I have the feeling that starter is treated as not the way it should be, but man, different opinions, the maintainers have a very technical view of things, I do have a very commercial one. I believe the starter should provide sane defaults and templates for implementation also for SEO.

I understand that SEO is in certain markets of lower importance because of either vaster access to VC funding (US for example), cpc pricing or less competition. Germany in this aspect has been very strong traditionally, being brought up career wise there in a very competitive industry I naturally put more emphasis on SEO than others might, as much as funding was exhaustive, the German startup landscape for ~25 years now is 80% eCommerce creating a very competitive situation where SEO is key.

I kindly ask you to take this in the most collaborative spirit when I tell you this implementation (externally sourced by a gem or made internally) is flawed, from a trailing slash point of view because it should be accompanied by a huge warning sign (personally I wouldn't have made that change) and from a canonical link perspective because it just doesn't solve the issue but instead contributes to the problem canonical tries to solve (from what I understand vetro and after). In my personal opinion this PR missed an opportunity to fix the issue.

I think a reasonable feature request may be to generate canonical tags pointing to the current slug when a historical one is used, but that is not a bug, rather an enhancement, since that behaviour has never existed.

I think it should either be removed or provide a sane default, the current one is not expected behavior by definition of what a canonical should do, therefor no, I don't think solidus should define what a canonical is and if it presents one in starter frontend behave in line with expectations of what it should do.

I hope this clears the air between us and we will make a PR that is more in line with what I think it should be as I believe sets wrong starting points by setting unhealthy defaults.

We have near exhaustive SEO documentation regarding best practices, if you want I can send you copy. I am right now trying to find the time to update it anyway.

Have a great day!

fthobe avatar Feb 05 '25 14:02 fthobe

@fthobe Thanks for your response, no hard feelings here, I would love if you are more careful with your choice of words in the future! A lot of hard work from the community goes into maintaining the Solidus ecosystem of gems, so I don't think suggesting anything should be thrown out is necessarily productive or helpful in improving the framework, if it doesn't work for your use-case 😄

With that in mind, can you update the issue title to reflect the content, I don't see any steps to reproduce the taxons issue, this issue seems focused on products, so lets remove that from the title and create a separate issue if the behaviour you expect is also broken with taxon pages.

Since there is a PR for the routing changes, I assume that part of this issue is no longer relevant because the resources won't be accessible under different routes. Is there anything other than handling legacy product slugs that is outstanding to be done as part of this issue?

Update: You mentioned an issue with the trailing slash handling, do you mind openning a separate issue with steps to reproduce the problem you are seeing? It would be helpful to identify that and tackle separate from this.

forkata avatar Feb 08 '25 00:02 forkata

if it doesn't work for your use-case

Man... now I am getting mildly offended.

Neither me, nor you or any higher spiritual being defines the canonical, usually RFC 6596 does.

https://www.rfc-editor.org/rfc/rfc6596.html

If you want a less dense version of that text the Wikipedia article about canonicals defines it as well.

So if in addition you need J. Kupke to personally explain you the meaning of a canonical and its application as he described it, I strongly believe I can arrange that, flat out serious!

What was done there (no matter at what stage of the process, this includes before or after canonical gems and probably already the initial commit as well) did just not yield the universally well documented purpose of a canonical. I mean, I literally line by line, with links and everything, wrote an essay about the technical purpose of a canonical because I believe it's important that people who work in eCommerce understand fundamentals of SEO.

So the fact is it wasn't ok before; it wasn't ok after and in addition there was undocumented change in behaviour regarding URLs in a comment deep down in the code, this was was not me constructing anything.

I personally do not want to ruin your experience of collaborating with me, but I will not sit down and praise that PR and in addition be told that this is me generalising my very particular use case when I try to press the starter template in what are tested best practices and 13 year old well defined internet standards. I simply believe it was, among surely many great ones you did, a sub par PR and I would warmly at this point advise you to read the documentation I gave you and you will surely figure out that any change made in this PR did not go beyond in value than removing a gem while missing 2 issues and artificially creating an undocumented and unexpected behaviour (so a third issue), no matter how much that offends you.

I am sorry, but in my mind, as much as I really apologise for offending you, you have no valid argument. I personally would love to create a starter frontend that adheres to a quality as if it would be a production website, because it's a reflection of an idea of a platform and the spirit that having control over features and data still makes sense in 2024 and I would really enjoy being aligned with you people about that. This includes for me learning Ruby better and literally relearn command line git after 15 years, but also for you sitting on the other end picking up an article about canonicals and educate yourself;)

fthobe avatar Feb 08 '25 03:02 fthobe

You really didn't need to write (and make me read) a small essay to point out that the implementation is incomplete.

Please proceed with suggesting a solution. We want canonicals to work correctly on the project.

jarednorman avatar Feb 10 '25 20:02 jarednorman

We will try to back port what we have.

fthobe avatar Feb 10 '25 20:02 fthobe