ultimate-sitemap-parser icon indicating copy to clipboard operation
ultimate-sitemap-parser copied to clipboard

SSL Certificate error fix?

Open ma26yank opened this issue 3 years ago • 1 comments

I was testing this package for a web crawler I was building. But at times it gives below error. Is there any argument I have to pass or is this a bug?

_IndexWebsiteSitemap(url=https://www.crummy.com/, sub_sitemaps=[InvalidSitemap(url=https://www.crummy.com/robots.txt, reason=Unable to fetch sitemap from https://www.crummy.com/robots.txt: HTTPSConnectionPool(host='www.crummy.com', port=443): Max retries exceeded with url: /robots.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (ssl.c:1131)'))))])

what I am trying is:

from usp.tree import sitemap_tree_for_homepage
tree = sitemap_tree_for_homepage("https://www.crummy.com")
print(tree)

ma26yank avatar Mar 08 '22 12:03 ma26yank

You can subclass your own RequestsWebClient, and in particular in the get method use requests.get( ... , verify=False)

Then do something like sitemap_tree_for_homepage('https://www.crummy.com', web_client=MyClient())

japherwocky avatar Dec 22 '22 21:12 japherwocky