webpage2html
webpage2html copied to clipboard
Unicode symbols handling
Some symbols are handled incorrectly. For example: Original: “Smoking Kills.” Result: ΓÇ£Smoking Kills.ΓÇ¥
Original: lawyers’ Result: lawyersΓÇÖ
Any demo html page for testing?
Sorry for sooooooooo late response, but I failed to reproduce the same result as provided.
This HTTP header alone not work for me:
content-type: text/html
This HTTP header header for me:
content-type: text/html; charset=UTF-8
In other words, webpage2html gets confused by missing charset=UTF-8 HTTP header. If this is to be considered a bug or not, I don't know. But perhaps something worth documenting.
In my case it helped to add charset utf-8; to nginx location config.