php-sitemap-generator icon indicating copy to clipboard operation
php-sitemap-generator copied to clipboard

Inappropriate link protocols included in sitemap

Open Steve-A-Orr opened this issue 4 years ago • 1 comments

I would argue that links containing the "mailto" and "tel" protocols should not be included in the sitemap. I had an email link show in my sitemap because it included the website domain in the address (whereas the GMail address was not included, because external links were not being crawled). A reference explaining this type of link: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a

For example, a crawl of the "www.examplesite.com" website which contains a "mailto" link using the "[email protected]" address, would have an entry like this in the sitemap: http://www.examplesite.com/mailto:[email protected]

Steve-A-Orr avatar Apr 28 '21 01:04 Steve-A-Orr

Hi Steve,

Thank you for opening this issue.

I agree that these links should be excluded from the generator. A possible solution would be to add "mailto:" and "tel:" to the "KEYWORDS_TO_SKIP" array.

Example

"KEYWORDS_TO_SKIP" => array(
    "mailto:",  // Will skip all entries of mailto: when crawling
    "tel:", // Will skip all entries of tel: when crawling
)

This will exclude all cases of these types of links in your sitemap and will also keep the posibility of including them if anyone wishes to do so.

Tristan Goossens

tristangoossens avatar Apr 28 '21 09:04 tristangoossens