tokenizer icon indicating copy to clipboard operation
tokenizer copied to clipboard

URLs tokenzing individual characters in the URL

Open joshweir opened this issue 7 years ago • 1 comments

Tokenizer::WhitespaceTokenizer.new.tokenize "www.google.com" => ["www", ".", "g","o","o","g","l","e",".","c","o","m"]

I want the website urls to be tokenized as a single noun effectively so would expect www.google.com to tokenize as "www.google.com". I am happy to fork this repo and would like to contribute.

Thanks for repo btw, it's useful.

joshweir avatar Mar 28 '17 04:03 joshweir

@joshweir thank you for reporting, I'll review this next week.

arbox avatar Mar 28 '17 08:03 arbox