cppreference-doc icon indicating copy to clipboard operation
cppreference-doc copied to clipboard

Incorrect handling of UTF-8 encoding during preprocessing

Open refack opened this issue 8 months ago • 2 comments

Example:

cppreference-doc-20250209\reference\en.cppreference.com\w\cpp\header\bit.html:4 which is the page title

  • the raw dump has it correctly: <title>Standard library header &lt;bit> (C++20) - cppreference.com</title> it's just UTF-8 encoded Image
  • the html in the zip (also in the .tar.xz) has it twice encoded: html-book-20250209.zip\reference\en\cpp\header\bit.html <title>Standard library header &lt;bit&gt; (C++20) - cppreference.com</title> adding \u00C2 kruft Image

I'd be happy to look into it.

refack avatar Apr 20 '25 14:04 refack