python-sitemap issues

Add package to PyPI

5

This package seems quite popular and would benefit from being on [PyPI](https://pypi.org/). We could check out [Poetry](https://github.com/python-poetry/poetry) to keep it simple. I can take a look at doing this one...

Garrett-R

URL UnicodeEncodeError

If the URL contains UNICODE encoding, python will report an error. debug info: > INFO:root:Crawling #1: https://gvo.wiki/html/NPC掉落書籍.html > DEBUG:root:https://gvo.wiki/html/NPC掉落書籍.html ==> 'ascii' codec can't encode characters in position 13-16: ordinal no...

wkingnet

AttributeError: 'NoneType' object has no attribute 'geturl'

4

I got such error python3 main.py --domain https://domain.com --output sitemap.xml Traceback (most recent call last): File "main.py", line 60, in crawl.run() File "/root/python-sitemap/crawler.py", line 127, in run self.__crawl(current_url) File "/root/python-sitemap/crawler.py",...

devopsenko

Exclude canonicalized pages

sometimes we have URLs that are canonicalized to other pages, and these should not be included in the sitemap. See google's reference: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap So the logic would be to look...

Spidle

Python 3.9.6 support? SyntaxError

5

Hi, I am getting a SyntaxError when trying to execute the file, no matter what link I type in. Also "" and '' don't work Is there a way to...

PeterWoelfel

Handling more than 50,000 URLs

4

Hi, just wanted to say thanks for such a great library. One need we have is to generate a sitemap for a site that has more than 50,000 URLs. The...

jswilson

enhancement

Feature Request: Limit per category/section the number of URLs to parse

I have a website with millions of categorized records, it will be useful if I could limit the number of urls to parse per section. E.g. the 900,000 first urls...

Veilkrand

No URLs found

7

Number of found URL : 1 Number of links crawled : 1 python main.py --domain https://www.domain.com --output sitemap.xml --report ``` ```

exportio

Added a rate limiter for load reduction on the website

7

We found that websites found the scraper too resource consuming. Therefore I added this configurable rate limiter, to be able to decrease the number of requests per time period.

Bash-

Stop and continue

1

The issue with this tool is once it halts, your have to start all over again from scratch. And with large sites this is a very common scenario. Since we...

ishandutta2007

enhancement

python-sitemap
python-sitemap copied to clipboard

Metadata

Add package to PyPI

URL UnicodeEncodeError

AttributeError: 'NoneType' object has no attribute 'geturl'

Exclude canonicalized pages

Python 3.9.6 support? SyntaxError

Handling more than 50,000 URLs

Feature Request: Limit per category/section the number of URLs to parse

No URLs found

Added a rate limiter for load reduction on the website

Stop and continue

← Metadata

Owner

Metadata

python-sitemap python-sitemap copied to clipboard

Metadata

← Metadata

Owner

Metadata

python-sitemap
python-sitemap copied to clipboard