hldig
hldig copied to clipboard
htdig cannot crawl urls with Hindi characters
When running jekyll locally, this was output in the console when I ran htdig -i
. It crawled the jekyll web server running at http://localhost:4000
[2017-11-12 14:44:22] ERROR bad URI `/tag/कैनबिस/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/अध-ययन/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/मनःचिकित-सा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/कोगनीटिव/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/दवा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/चिकित-सा/'
I guess the whole String/Retriever/HtWord* should be converted to be able to handle Unicode with iconv? Not experienced enough to take this on tho.
But that's a great hint though (if you're right). Thanks!