Max length for URLs
Motivation
Currently, our links are the page title or a collection of the page titles in the breadcrumbs. That leads to very long links, especially when they are not in Latin letters. This doesn't look very nice when you link to a page, and if you share a link, it might get cut off by certain apps. This leads to a couple of Sentry issues when the cutting happens in a way that the formatting breaks. We've also discussed this in the App repo.
Proposed Solution
I can think of a couple of solutions but I'm sure other people can think of plenty more:
- We can have a hard limit on how many characters each part of a link has, and cut off everything after. This could lead to some pages having the same URL :(
- We can force the user to create a short URL. This is probably annoying for everyone involved during implementation but would probably work after.
- We can limit URLs to UTF characters, e.g. by using the German / English version of a page with the different language parameter. That might not shorten all URLs sufficiently though.
- We can use only the short URLs but I assume people don't like them.
Alternatives
It's also not that common a problem. But it is annoying if someone shares an interesting link with you, presumably because they think that will help you with a specific problem you have, and then the link just errors out.
User Story
As an Integreat app user I want to be able to paste links to any communication app I want so that other people can follow the link that I shared and open the correct page.
Additional Context
First step for this issue: Do some research and present a few possible solutions
My first association: The snippet in the title on sentry is ?(weilheim-schongau/ru/%D0%BF%D0%BE%D0%BB%D0%B5%D1%82-%D1%83%D0%B1%D0%B5%D0%B6%D0%B8%D1%89%D0%B5/%D0%BB%D1%8C%D0%B3%D0%BE%D1%82%D1%8B-%D0%BF%D0%BE-%D0%BF%D1%80%D0%B5%D0%B4%D0%BE%D1%81%D1%82%D0%B0%D...
This suggests to me that the problem mainly arises from non-ascii characters being used in the slug, which also becomes the URL, where they get URL-encoded meaning that every non-ascii character becomes three ascii characters.
My gut feeling is that we should either just
- change the way the slug is generated, restricting it to ascii characters (I know that for Japanese characters there exists a romanized way of spelling, maybe there are similar methods for most other languages as well…?)
- or leave the slugs as they are, but translate them to some more compact ascii-only form for URLs