shlink
shlink copied to clipboard
Allow user agents to be customized in robots.txt
Summary
An ability to read a text file that contains customization of robots.txt so that customization can be backed up or be persisted outside the docker container.
Use case
I've been been editing the module/Core/src/Action/RobotsAction.php file inside the container as I (and possibly many other people with similar needs) would like to allow Facebook's bot[1] so that as I paste Shlink's links will have article preview. But this was broken when I switched to stable-roadrunner (great image btw!) because -- obviously -- I forgot to add my robots.txt customization.
Since this feature would be pretty straight forward (as I already know which file output robots.txt content) I was thinking to add it by myself, but I'm not sure this -- externalize a part of robots.txt for user to persist container data -- is a good idea, so I would like to validate this idea with you if I can add this feature.
Thanks for the great work folks btw!
[1] Allowing Facebook's user-agent in robots.txt
User-agent: facebookexternalhit
Disallow:
A related topic has been recently discussed here https://github.com/shlinkio/shlink/discussions/2067, and while I would prefer not to expect people to customize the robots.txt by providing a file, I agree certain level of customization should be possible.
I mentioned some of the problems and history from current implementation here https://github.com/shlinkio/shlink/discussions/2067#discussioncomment-9179521, and I already put together and merged a feature to allow all short URLs to be crawlled by default, if desired https://github.com/shlinkio/shlink/pull/2107, which would result in the same you mentioned above, but for any crawler, not just facebook's specifically.
On top of that, the only missing piece would be to allow you to provide a list of user agents you want to allow, falling back to *
if the option is not provided. Something in the lines of ROBOTS_ALLOW_USER_AGENTS=facebookexternalhit,Googlebot
.
That said, you can already make your short URLs crawlable, with the limitation that it needs to be done one by one, hence the PR above.
Thanks! I'll take a look at https://github.com/shlinkio/shlink/pull/2107 next time!
I'm going to re-purpose this issue to specifically allow user agents to be customized in robots.txt. That plus the already existing capabilities around robots.txt should cover most use cases in a more predictable and reproducible way.
Later on, if there's still some missing capability, I'm open to discuss more improvements and features.
That's cool @acelaya !! Thank you!!
This feature is now implemented and will be part of Shlink 4.2