Zeno icon indicating copy to clipboard operation
Zeno copied to clipboard

Enhanced Non UTF8 HTML Support

Open akshithio opened this issue 9 months ago • 0 comments

Copied from #253:

Follows the recommended tips and trick to provide better support for different character encodings and character sets in HTML pages.

This adds additional test cases in html_test.go, makes changes to html.go to improve maintainability and robustness where applicable and adds methods to models/url.go to handle different character encodings. Test cases have also been changed in html_test.go to consider the assets that are extracted rather than purely the number of them that are extracted.

I'm open to any feedback


I also tried to take into consideration the comments mentioned on that PR to not manually use idna to convert the URLs and hostnames to ASCII. I've also tried to include the changes I noticed on commit 93f6658 and e2b245b to html.go which would have caused a merge conflict on my original PR.

Attempts to close #169.

akshithio avatar Apr 07 '25 19:04 akshithio