python-sitemap icon indicating copy to clipboard operation
python-sitemap copied to clipboard

No URLs found

Open exportio opened this issue 5 years ago • 7 comments

Number of found URL : 1 Number of links crawled : 1

python main.py --domain https://www.domain.com --output sitemap.xml --report

<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

</urlset>

exportio avatar Mar 21 '19 01:03 exportio

Hi,

Interesting… I don't have the problem here. What is your python version ?

Capture d’écran 2019-03-21 à 09 25 18

c4software avatar Mar 21 '19 08:03 c4software

@samboustani Problem still present ?

c4software avatar Mar 27 '19 12:03 c4software

Same problem here.

GovetaXV avatar Mar 30 '19 13:03 GovetaXV

try this url: https://paperarchive.space/

GovetaXV avatar Mar 30 '19 13:03 GovetaXV

@GovetaXV Hi,

Thanks for the link. Unfortunately the current version of python-sitemap doesn't support « full javascript » website, this is why the paperarchive.space doesn't work.

Sorry

c4software avatar Apr 01 '19 15:04 c4software

+1 Same issue No error log

ishannaktode avatar Jun 03 '19 10:06 ishannaktode

This looked pretty hopeful, but didn't work for me either. This isn't a full headless site by any means.

$ python3 main.py --domain https://canada.ca --output sitemap.xml --report
Number of found URL : 1
Number of links crawled : 1
Mikes-MBP-3:python-sitemap mikegifford$ cat sitemap.xml 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

</urlset>

But maybe this helps.

$ python3 main.py --domain https://canada.ca --output sitemap.xml --debug
INFO:root:Start the crawling process
INFO:root:Crawling #0: https://canada.ca
DEBUG:root:https://canada.ca ==> <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>
INFO:root:Crawling has reached end of all found links

mgifford avatar Aug 11 '20 15:08 mgifford