piranha.core
piranha.core copied to clipboard
Hierarchical Slug Routing
From a related conversation in #1201, I was looking to get more information on the slug routing for page content.
Recap: On the topic of url slug lengths, I can see how 128 can be too short for a full URL if using a hierarchical and word-based routing and depending on URL structure choice.
That said, if the slug represents only the segment of the URL a single piece of content represents (as it should, in my opinion) then 128 is more than sufficient. This only works if all routing is hierarchical like Piranha's archives are (seen in image below) where the slug is only the URL segment under the blog itself.
Unfortunately, this isn't the same for nested pages, the slug is the entire URL. (seen below)
I'm used to content management systems that allow both, a hierarchical structure-based URL (/about-me/sub-page-with-long-slug in the above example) by default and an optional simple/short URL (/long-slug).
Is the non-hierarchical routing for page content intentional and purposeful or just to make routing/content resolution easier/faster?
@i-love-code Well Piranha actually generates hierarchical slugs if the config option Config. HierarchicalPageSlugs is set to true (which it is by default). As the slug is generated when the page is saved the first time this only occurs if you actually create the page in the correct position, there's nothing that remaps slugs when you move a page, since this would change the permalink structure and generate dead slugs.
This does however bring up a weakness in the design, that you will run out of space after a certain amount of recursive pages since the slug length is set to 128 on all hierarchical levels.
Now there's two solutions to this.
We increase the slug length which solves the problem of this issue and the fact that generated hierarchical slugs can easily exceed the limitation, or Add dynamic routing features where it scans the hierarchical structure to find the suitable page. Solution 2 would also mean that permalink are automatically updated when a page is moved, which is not optimal unless we would also automatically generate alias rules that would redirect the old permalink to the new whenever a page is moved.
What's your opinion on this @i-love-code @eric-wilson @filipjansson?
Good to hear on the hierarchical slug option. I can see how that falls through when pages are moved due to parent paths being saved in the slug at content creation time. The example from my last comment was actually because I made the page at root and dragged it under about-me, leading to the incorrect/misleading slug comment.
Hierarchical URL Length I don't believe any URL really needs to be 2,000 characters, but that does appear to be the defacto "limit" imposed by older browsers hierarchical routes would want to stay under. At some point, if hierarchical routing is used I personally believe the responsibility for the full URL length is passed to the user when managing their content. I don't believe that organized content with thoughtful URL strategies and still going above 2k characters is going to be a common issue that needs to be addressed.
URL Segment Length I wouldn't see a reason (other than maybe performance) to enforce a URL segment maximum, but I do know platforms that get by very easily with a segment max of 30. This gives the user a high amount of categorization customization with ~60 levels of categorization/hierarchy for content.
It's worth noting that sites hosted on IIS have a default maximum URL segment count of 32, meaning it would be necessary to support upwards of 60 characters in each segment to give full URL length capability.
URLs - SEO and Aliases I currently work most with a CMS that relies on a hierarchical scanning-based routing system, so when pages are moved their URLs update to their relative parent's URL + slug. I do prefer this option, but some clients do manually track and setup redirects/aliases for old URLs.
I would love and happily contribute to an integration between page/content moves and the alias system to autogenerate aliases for moved content. Depending on needs, it could even be an option in a dialog when moving content around the page tree and changing URL segment asking the user if they would like to generate an alias.
@tidyui @i-love-code @filipjansson,
Good catch, I guess I should have clarified that I was using the hierarchical structure, that adds a better clarification to my use case (I'm still learning the system and didn't think about it from the other perspective- sorry).
URL Length I personally vote for the length increase. For max lengths: My suggestion is going with the industry standards. If chrome supports a max of 2000 characters then that's what I would support in code (if it were up to me). That way no one can say the product is limited to x when industry standards are y. Also, remove the restrictions from the Core, and let them override it by creating a new ORM and migration set (keeps it extremely open).
URL Segments Option #2: Scanning segment by segment (virtual directory by virtual directory) opens up some performance issues in additional accidental re-routing you mentioned.
URL Segment Max
Assuming the search is not scanning segment by segment, I don't see any reason to limit the segments here either. IIS's limitation is probably for performance in traversing a directory and subdirectories. For piranha's db searching like: x => x.slug == "category/sub-cat/sub-cat/sub-cat/page"
flattens it out w/o any performance issues.
Auto URL Updates & Aliases Automatically updating the URL route when it's dragged and dropped into the new section would be a nice feature.
However, I'm guessing users would also like to have auto-generated redirects point to the new structure (or at the very least have this as optional behavior in the settings). I did this in my last homegrown CMS system. If you do this, two Gotchas (that I can think of) you need to be aware of:
- Accidental infinite redirects. You need to make sure that a redirect pointing to another URL which also has a redirect isn't pointing back to the original redirect (I know this the hard way). For example, you move page 'about-us' to a sub-directory 'company' then later move it back to the root. The initial move will create a redirect to /company/about-us, the next move will create a redirect to /about-us. Now if you go to /about-us it redirects to /company/about-us, which redirects back to /about-us, and thus the game of ping-pong begins.
- Also if a page has been moved several times, then you need to follow the chain internally to get the last redirect, otherwise, you may issue too many redirects down the pipe - eventually, chrome will give an error of
too many redirects
- again learned the hard way 😉.
I'm happy to contribute to any of these as well.
Good convo. I've been wanting to discuss this topic in Piranha for a bit now.
URL Segment Max I was just curious on this topic, given we're talking about character-length restricted segments and hierarchical URLs with multiple segments. So long as performance doesn't suffer, I agree with a flexible/non-restricted URL length (or slug length, given current Piranha functionality with URL stored in full in database as slug).
URL Segment Scan / Full Slug in Database Your comments above are correct, Piranha currently routes and checks for an exact path match.
I'd be interested to see how much slower routing is with large numbers of content with a segment-scanning approach to see if it makes sense to store full URLs as it does today. Making it a segment-scanning approach would affect performance, but not doing it adds a lot of management code to keep full URLs in database accurate when moving content.
Moving a single piece of content sounds small, but moving/publishing a change to a top level folder with 400 descendants implies affecting 400 rows in the DB. Additionally, this would/could result in 400 aliases if auto-aliasing on move was implemented.
On the other hand, if you did a scanning-based approach, you only need a URL segment per content item but you'll need to make multiple calls to determine the content based on each URL segment, which as you said can be less performant. I know the system I currently use does this and caches partial URLs hard for later ease of resolution.
URL Aliases / Infinite Loop An internally developed/homegrown solution we currently use considers content-driven aliases different from the aliases users can type in. (p.s. due to us learning the hard way like you :D)
We allow URL-based or content-based redirects/aliases, the latter requiring a content ID stored on the alias. This prevents the kind of multi-jump aliases you mentioned being both an SEO, browser and performance no-go. Prevents multi-jumps as it will always resolve the target content's true URL by ID when triggered, rather than relying on the text URL entered upon creation. Has helped quite a bit.
Well, given that there's some kind of model cache enabled for the service layer (which should always be the case in production scenarios) the hierarchical Sitemap
structure could be used for scanning the incoming request.
Regarding alias creation we had this function < v7.0
that gave the user the suggestion through the toast notifications when the slug was updated, but it unfortunately it was not migrated to the new admin-interface. This also had some checks for multiple redirects on creation if slugs were changed multiple times for the same page, so I think we can find some code we can use there.
Ok, I just realized another thing that would need to be handled. Currently the Slug
field is Unique
per site. Since hierarchical permalink are generated on save it's as such fully possible to have the following pages:
/about-us/info
/services/info
However, if we remove the hierarchical part from the slug so it only contains the URL-segment of the current page, then the index would need to be Unique on (SiteId
, ParentId
, Slug
).
Which also means that there will be the possibility of conflicts when moving a page that needs to be resolved in some way.
@i-love-code @eric-wilson As a test I implemented recursive scanning for permalink to the router and it was not that complicated. Given that fact that it operates on the Sitemap
structure which is cached (and already retrieved in the middleware) I'm guessing this routing will be more performant than the existing one 😜
https://github.com/PiranhaCMS/piranha.core/commit/b8dd81c578aedbe6107a0db2ecb93341365e0f72
Of course, it needs a lot of other stuff to be complete
I agree that the slug should only contain the URL segment of itself, and the uniqueness is determined by Slug
, ParentId
and SiteId
.
Maybe we should bring back the notification to help create an alias if the user change the slug or move an page/post.
I'll check that out @tidyui.
Any chance we could come up with a (rough) list of what else might be needed for this to be functional/usable? I'd love to contribute, so a list of what's necessary/ideal for an MVP would be great.
@i-love-code Absolutely, I'll take a closer look and make a complete list of what would be needed to complete the functionality.
Is it not possible to separate the slug and the permalink? For example, have a unique GUID represent the permalink but the slug be dynamic? In my mind, the slug should be dynamic because to me the URL matters. In cases where a permalink is required, I think it's an acceptable trade-off for it to be a non-obvious value (such as a GUID).
For example, the slug could be about-me/sub-page-with-long-slug
while the permalink could be something like pages/704a7099-e7bd-4f08-a25f-fc9a04bb1044
.
Either way, I'm in favor of the hierarchical slugs option as well as increasing the maximum slug length.
If you'd like to access pages with an URL like pages/704a7099-e7bd-4f08-a25f-fc9a04bb1044
you can just add your own middleware first in the pipeline that gets the slug for the page with the given Id
and just rewrites the request before handing over the request to the next middleware component.
If this is possible then I definitely think that moving a page should change the slug. It would be nice, however, to see an official implementation for this type of permalink middleware.