webpage2html icon indicating copy to clipboard operation
webpage2html copied to clipboard

Unicode symbols handling

Open ralfeus opened this issue 8 years ago • 4 comments

Some symbols are handled incorrectly. For example: Original: “Smoking Kills.” Result: ΓÇ£Smoking Kills.ΓÇ¥

Original: lawyers’ Result: lawyersΓÇÖ

ralfeus avatar Oct 10 '17 13:10 ralfeus

Any demo html page for testing?

zTrix avatar Oct 14 '17 08:10 zTrix

Here it is

Book.zip

ralfeus avatar Oct 14 '17 12:10 ralfeus

Sorry for sooooooooo late response, but I failed to reproduce the same result as provided.

zTrix avatar Jan 02 '18 13:01 zTrix

This HTTP header alone not work for me:

content-type: text/html

This HTTP header header for me:

content-type: text/html; charset=UTF-8

In other words, webpage2html gets confused by missing charset=UTF-8 HTTP header. If this is to be considered a bug or not, I don't know. But perhaps something worth documenting.

In my case it helped to add charset utf-8; to nginx location config.

adrelanos avatar Nov 27 '20 11:11 adrelanos