python-sitemap icon indicating copy to clipboard operation
python-sitemap copied to clipboard

URL UnicodeEncodeError

Open wkingnet opened this issue 3 years ago • 0 comments

If the URL contains UNICODE encoding, python will report an error.

debug info:

INFO:root:Crawling #1: https://gvo.wiki/html/NPC掉落書籍.html DEBUG:root:https://gvo.wiki/html/NPC掉落書籍.html ==> 'ascii' codec can't encode characters in position 13-16: ordinal no t in range(128)

Solution:

  1. edit crawler.py Add the following code at the top
import string
from urllib.parse import unquote
  1. then search current_url = self.urls_to_crawl.pop()

  2. add a line below

current_url = self.urls_to_crawl.pop()
current_url = quote(current_url, safe=string.printable)

wkingnet avatar Apr 28 '22 12:04 wkingnet