python-sitemap
python-sitemap copied to clipboard
URL UnicodeEncodeError
If the URL contains UNICODE encoding, python will report an error.
debug info:
INFO:root:Crawling #1: https://gvo.wiki/html/NPC掉落書籍.html DEBUG:root:https://gvo.wiki/html/NPC掉落書籍.html ==> 'ascii' codec can't encode characters in position 13-16: ordinal no t in range(128)
Solution:
- edit crawler.py Add the following code at the top
import string
from urllib.parse import unquote
-
then search
current_url = self.urls_to_crawl.pop() -
add a line below
current_url = self.urls_to_crawl.pop()
current_url = quote(current_url, safe=string.printable)