ros2_documentation icon indicating copy to clipboard operation
ros2_documentation copied to clipboard

No robots.txt, sitemap.xml in the web root

Open abrandemuehl opened this issue 1 month ago • 5 comments

Issue Type

  • [x] 🐛 Bug / Problem
  • [ ] ✏️ Typo / Grammar
  • [ ] 📖 Outdated Content
  • [ ] 🚀 Enhancement

Generated by Generative AI

No response

Distribution

No response

Description

I see that make multiversion generates a sitemap.xml for every version of the documentation, however these xml files 404 on the deployed web version. Is someone manually telling google to index pages? Why is there no robots.txt + sitemap.xml visible on the web version?

The deployed web version 404's on the following: https://docs.ros.org/sitemap.xml -> this one actually is generated in build/html https://docs.ros.org/robots.txt -> this one doesn't get generated

This is related to the discussion here: https://discourse.openrobotics.org/t/discoverability-of-documentation-on-search-engines/51059

https://github.com/ros-infrastructure/rosindex/issues/552 was also mentioned there. I also noticed that index.ros.org has a proper /robots.txt and sitemap.xml.

Another related problem is that the generated sitemap.xml for versions don't include all of the automatically generated documents that are hosted on docs.ros.org like https://docs.ros.org/en/humble/p/tf2/, so I wonder if they aren't being indexed by google

Affected Pages/Sections

No response

Screenshots or Examples (if applicable)

No response

Suggested Fix

I think there's a problem with whatever deployment configuration of the web server that prevents sitemap.xml from being reached. In addition a robots.txt pointing to the sitemap index would be helpful for search engines as well.

I'm not sure how the package specific docs are created but those should also be in the sitemap

Additional Context

No response

abrandemuehl avatar Nov 21 '25 09:11 abrandemuehl

This issue has been mentioned on Open Robotics Discourse. There might be relevant details there:

https://discourse.openrobotics.org/t/discoverability-of-documentation-on-search-engines/51059/7

ros-discourse avatar Nov 21 '25 09:11 ros-discourse

If someone has access to the google search console and could post some screenshots of what google actually sees when it looks at docs.ros.org that would also be helpful in diagnosing why a lot of ros documentation results are not available on google

abrandemuehl avatar Nov 21 '25 09:11 abrandemuehl

Thanks for the suggestion.

I am not sure if we have the Google Search Console setup in the first place. It looks like we would need to first enable it and then figure out how we want to manage access. Let me ping the infra team and see what we can do. We've also got some internal efforts happening and I want to make sure we don't step on those toes.

kscottz avatar Nov 21 '25 18:11 kscottz

I've been watching it for index.ros.org but don't have docs.ros.org in my view. I'll reach out to OSRF to see if I can take a look at this.

tfoote avatar Nov 21 '25 19:11 tfoote

From triage meeting: assigning to @gbiggs for delegation to the right person

wjwwood avatar Dec 04 '25 18:12 wjwwood

@gbiggs got me access to the search console. The multiversion site maps are all being indexed successfully

Image

robots.txt is a blocking mechanism and a lack of it won't prevent things from being indexed.

I'm going to close this as the generated sitemaps are working fine. But I'll also open some new issues from insights in the console in other places that I've noticed.

tfoote avatar Dec 18 '25 21:12 tfoote