broken-link-checker
broken-link-checker copied to clipboard
UTF8 characters cause valid links to be detected as broken
I prepared test case with https://github.com/matkoniecz/broken-link-checker-local-utf8
blc https://matkoniecz.github.io/broken-link-checker-local-utf8 -r
See https://matkoniecz.github.io/broken-link-checker-local-utf8/ - both link work, one with utf8 characters gets BLC_UNKNOWN/HTTP_undefined errors
mateusz@grima:~$ blc https://matkoniecz.github.io/broken-link-checker-local-utf8 -r
Getting links from: https://matkoniecz.github.io/broken-link-checker-local-utf8
├───OK─── https://matkoniecz.github.io/broken-link-checker-local-utf8/test%20space.html
└─BROKEN─ https://matkoniecz.github.io/broken-link-checker-local-utf8/test_zażółć.html (BLC_UNKNOWN)
Finished! 2 links found. 1 broken.
Getting links from: https://matkoniecz.github.io/broken-link-checker-local-utf8/test%20space.html
└─BROKEN─ https://matkoniecz.github.io/broken-link-checker-local-utf8/test_zażółć.html (HTTP_undefined)
Finished! 2 links found. 1 excluded. 1 broken.
Finished! 4 links found. 1 excluded. 2 broken.
Elapsed time: 1 second
Sorry if that is my misunderstanding but as I understand it the UTF8 is de facto working in links
UTF8 may be internally different but browsers seems 100% fine with links including letters like https://en.wikipedia.org/wiki/Ogonek
Sanity check: https://stackoverflow.com/questions/22357509/can-urls-have-utf-8-characters
Even DNS supports URF8 characters (with some workarounds and restrictions) https://en.wikipedia.org/wiki/Internationalized_domain_name
replaces https://github.com/LukasHechenberger/broken-link-checker-local/issues/50
I have the same problem with websites in Chinese and Thai languages. While the links exist the program reports an error of type (BLC_UNKNOWN)
I have the same problem with grave accents and acute accents, those are very common in Latin languages and present in other languages too. For example https://www.iswatersafetodrink.in/Italy/Cantù
Can I do anything so as the first step "needs confirmation" can be dropped?