python-sitemap-generator icon indicating copy to clipboard operation
python-sitemap-generator copied to clipboard

Missing dependencies in list plus ERROR due to unsanitized special characters

Open cooperdk opened this issue 3 years ago • 2 comments

Hi,

you're missing a dependency in your list which is not available by default. Why don't you instead include a requirements.txt file as per Python standards?

lxml

Also, you need to sanitize the URLs in order to avoid errors with international and special characters. It's really easy:

sanitized_string = htmlentities(unsanitized_string)

You should just append the sanitized URL to the queue, I imagine.

cooperdk avatar Jul 06 '22 23:07 cooperdk

Great utility - thanks

I have seen the exact same thing with lxml and UTF8 web pages For example: http://www.themadhowes.org.uk/kpop/subtitles.html

timbly5000 avatar Jul 15 '22 06:07 timbly5000

@cooperdk , I'm not familiar with python standards, would you mind creating PR with those changes?

wiejakp avatar Dec 26 '22 07:12 wiejakp