siteone-crawler icon indicating copy to clipboard operation
siteone-crawler copied to clipboard

Ability to only scan sitemap.xml urls

Open a1402980 opened this issue 6 months ago • 3 comments

I would like the ability to only scan urls inside the sitemap. This is to verify that the links that are generated there work. This could work for example when I add a link to https://example.com/sitemap.xml url, it would check all the urls on the page. Currently it only checks that the sitemap loads.

a1402980 avatar Jun 30 '25 15:06 a1402980

I have just tested it out using the source code. Running ./crawler --url=https://my-site.com/sitemap.xml did crawl all urls found within sitemap. What version are you using?

AleksaRistic216 avatar Jul 19 '25 11:07 AleksaRistic216

I was using the latest mac arm64 version of the GUI and as an example I used https://crawler.siteone.io/sitemap.xml Image

And from the report we can verify that it only visited the sitemap.xml page and not any of the urls within the sitemap. Image

I used the default settings:

Image

a1402980 avatar Jul 22 '25 19:07 a1402980

GUI application may use different version of core application (this one). Looking at link you sent me, GUI uses 1.0.8, while current latest release is 1.0.9 (and source code can be few steps in front too).

Looking at release logs, I can see Crawl from Sitemap: You can now provide a URL to a sitemap.xml or sitemap index file directly to the --url parameter to crawl all listed URLs.. Not sure if this implements whole sitemap functionality, but it may be the right thing to suspect.

AleksaRistic216 avatar Aug 08 '25 18:08 AleksaRistic216