joomla-cms icon indicating copy to clipboard operation
joomla-cms copied to clipboard

[5.2] SEF: Enforcing correct SEF URL

Open Hackwar opened this issue 1 year ago • 15 comments

Summary of Changes

Joomla has been improving its SEO performance constantly and one issue which is still open is, that you can reach the same page via different URLs. This means that you can reach the same article both via the wanted URL, but also via the /component/content/article/<id> URL or even via index.php?option=com_content&view=article&catid=<catid>&id=<id>.

This PR adds a new feature which redirects URLs with query parameters to their SEF variant. The router parses the URL and if the original URL contained query parameters, it tries to create a new URL from the information it parsed and if that new URL does not contain any query parameters anymore, it does a redirect to that URL. This will remove a lot of the double URLs for one content item. This only happens for GET requests.

It should be noted, that this does NOT remove every case of duplicate content. For example additional URL parameters are still kept.

I'd like to thank ithelps Digital for sponsoring this feature.

Testing Instructions

  1. Apply the patch
  2. Use some content to test this with, for example the testing sample data.
  3. Go to /park-blog/first-blog-post in your site when you use the testing sample data
  4. Add a ?id=18 to the URL and see that you actually get the "Second Blog Post" displayed instead of the first one.
  5. Enable the option in the SEF plugin.
  6. Reload the frontend page and see that you are redirected to /park-blog/second-blog-post.

Link to documentations

Please select:

  • [X] Documentation link for docs.joomla.org: https://docs.joomla.org/Search_Engine_Friendly_URLs

  • [ ] No documentation changes for docs.joomla.org needed

  • [ ] Pull Request link for manual.joomla.org:

  • [ ] No documentation changes for manual.joomla.org needed

Hackwar avatar Feb 21 '24 22:02 Hackwar

I see what you want to achieve here, but:

  • how can a new feature not require documentation (especially such a critical?)
  • the whole feature sounds like it's solved when we implement canonical tags

bembelimen avatar Feb 22 '24 09:02 bembelimen

Sorry about the documentation. It was late yesterday and I will create documentation for the whole SEF thing in the next 48 hours.

Regarding the canonical tag I would disagree. To me, the canonical tag is just a bandaid to get around using the right URL and thus using the right URL in the output is the first step (which we do in Joomla anyway). The second step would then be to redirect wrong URLs to the right ones instead of keeping them and slapping a canonical on there.

Hackwar avatar Feb 22 '24 09:02 Hackwar

I have tested this item :white_check_mark: successfully on 6ae36cb3509f4b4bf8ff2f3c323ca257401c7cf6


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

ceford avatar Feb 23 '24 09:02 ceford

I have tested this item :white_check_mark: successfully on 6ae36cb3509f4b4bf8ff2f3c323ca257401c7cf6


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

nielsnuebel avatar Feb 24 '24 10:02 nielsnuebel

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

richard67 avatar Feb 24 '24 10:02 richard67

Link to documentation has been added.

Hackwar avatar Feb 24 '24 11:02 Hackwar

I've played around with this PR using the "testing" dataset in Joomla 5.1.x.

I've run into an edge case:

  • The testing dataset has an article that isn't accessible via it's own menu item but only via a category list, it's article ID 1, " Administrator Components": index.php/content-component/article-category-list/extensions/components/administrator-components
  • If I now open the "Archive Module" menu item (/index.php/content-component/archive-module) and add an extra query parameter to override the article id with the ID 1 (/index.php/content-component/archive-module?id=1) I'm receiving a redirect to this URL: /index.php/content-component/archive-module?view=article&id=1, whereas my expectation would have been a redirect to the category list URL mentioned above.

I'm aware that this is not quirk of the current PR but of the router in general. However, the current PR might increase the effects.

That's why I'm asking myself if we should limit the redirection to "clean" URLs without any leftover query parameters.

Thoughts? @Hackwar

SniperSister avatar Feb 24 '24 11:02 SniperSister

I agree, this sounds good. I'll add this to the PR.

Hackwar avatar Feb 24 '24 11:02 Hackwar

I encountered additional issues and will convert this back to draft first.

Hackwar avatar Feb 24 '24 12:02 Hackwar

Back to pending.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

richard67 avatar Feb 24 '24 12:02 richard67

What about a generic fetch/xhr request to index.php?option=com_xxx&... Do you mean that with this option enabled it will no longer be possible to perform a GET request to a raw Joomla URL? This would be too much generic.

joeforjoomla avatar Feb 24 '24 12:02 joeforjoomla

What about a generic fetch/xhr request to index.php?option=com_xxx&...

Fetch / XHR requests do follow redirects automatically.

SniperSister avatar Feb 24 '24 12:02 SniperSister

it works, perfect


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

miguelvazquez avatar Feb 24 '24 16:02 miguelvazquez

I have tested this item :white_check_mark: successfully on 33d1153c7624eeba138f993783eb3abd907ba79a

I've tested the following scenarios:

  • local installation for test dataset, crawled whole site before applying the patch and enabling the feature - after applying the patch, i recrawled the site and got no 404s
  • I've tried various non-sef URL pattern (index.php?Itemid=..., index.php?option=com_users&view=reset&Itemid=...) and got the expected result
  • I've tested the example provided in the test instructions
    This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

SniperSister avatar Mar 13 '24 09:03 SniperSister

I refactored the PR to use the new option from #43432 to reduce the number of options we add here.

Hackwar avatar May 11 '24 14:05 Hackwar

I have not tested this item.

Sorry, i can't reproduce the error with the given instructions. I always be redirectetd to the error 404 page.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

KingLouis1 avatar Jul 15 '24 17:07 KingLouis1

I have tested this item :white_check_mark: successfully on 09f8ba345508d55ca324fa01218c5eef2f191869

I tested this using configuration which resulted in a /component/content URL. Then I enabled a menuitem which matched this URL and the code successfully did a 301 redirect to the SEF URL for that menuitem. I repeated this for several cases and they all worked ok.

I also set up a site module which was a list of articles in a given category. The URLs of the articles were based on a category list menuitem, and when I clicked on one of the links it routed me correctly to the article.

I then set up a menuitem pointing directly at the single article and reloaded the site page. It didn't redirect to the new menuitem (as there weren't any query parameters on the URL). But when the module displayed the links it utilised the menuitem for that single article.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

robbiejackson avatar Jul 26 '24 21:07 robbiejackson

RTC


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/42854.

Hackwar avatar Aug 15 '24 10:08 Hackwar

Thanks @Hackwar !

pe7er avatar Aug 15 '24 20:08 pe7er