tokenizer
tokenizer copied to clipboard
URLs tokenzing individual characters in the URL
Tokenizer::WhitespaceTokenizer.new.tokenize "www.google.com"
=> ["www", ".", "g","o","o","g","l","e",".","c","o","m"]
I want the website urls to be tokenized as a single noun effectively so would expect www.google.com to tokenize as "www.google.com". I am happy to fork this repo and would like to contribute.
Thanks for repo btw, it's useful.
@joshweir thank you for reporting, I'll review this next week.