💡 Feature request: Support for the Canonical Tag
Is your feature request related to a problem? Please describe.
I've been noticing more and more Redlib instances being indexed by search engines, causing a lot of duplication of knowledge without any ability for search engines to de-duplicate (coming as user of Kagi, not Google). Currently, my solution is to block those domains in my searches, but I realize that most people don't have this ability.
That said, that vast majority of Redlib instances appear offline (e.g. timeouts or 500 errors) - so it's a general hindrance.
Describe the feature you would like to be implemented
It would be nice, as ROBOTS_DISABLE_INDEXING is disabled by default, for Redlib to include a Canonical tag in pages so that web crawlers can avoid duplication.
Ideally, this feature would be enabled by default as a sane default.
Describe alternatives you've considered
Alternatively, it might benefit the Internet if ROBOTS_DISABLE_INDEXING was enabled by default - so that users with personal instances not protected by authentication won't have the public randomly showing up - messing up their rate-limits.
Hm. Is there any previous examples of open-source front ends setting the Canonical tags? I'd be worried about downstream effects of setting this tag (I'm open to turning off indexing by default though).
The only reason I suggest using Canonical is because indexing is allowed by default (so Canonical would be the next best solution). I thought there might have been a reason e.g. Reddit blocking indexing from everyone except Google.
If there's no reason (my bad assumption), I think disabling indexing by default is best.
In comparison, I did some searching of the largest frontends that aren't Redlib:
| Project | Notes |
|---|---|
| Invidious | Indexing disabled, no canonical (although, they appear to track it) |
| nitter | Indexing disabled, no canonical |
| ProxiTok | Indexing disabled, no canonical |
I haven't found any OSS project that is a frontend + enables indexing, FWIW.