hldig icon indicating copy to clipboard operation
hldig copied to clipboard

htdig cannot crawl urls with Hindi characters

Open andy5995 opened this issue 6 years ago • 2 comments

When running jekyll locally, this was output in the console when I ran htdig -i. It crawled the jekyll web server running at http://localhost:4000

[2017-11-12 14:44:22] ERROR bad URI `/tag/कैनबिस/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/अध-ययन/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/मनःचिकित-सा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/कोगनीटिव/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/दवा/'.
[2017-11-12 14:44:22] ERROR bad URI `/tag/चिकित-सा/'

andy5995 avatar Nov 12 '17 20:11 andy5995

I guess the whole String/Retriever/HtWord* should be converted to be able to handle Unicode with iconv? Not experienced enough to take this on tho.

martijndeb avatar Dec 30 '17 17:12 martijndeb

But that's a great hint though (if you're right). Thanks!

andy5995 avatar Dec 30 '17 20:12 andy5995