Avik Sengupta
Avik Sengupta
@chinglamchoi do you have the code for this exercise available somewhere?
Will be very useful, please submit a PR, preferably with some tests and docs.
Tf-idf.jl would be best, I think
Not yet, but might be worth adding.
Not sure what we can do about this. Everyone just seems to use the Snowball stemmer.
Sure, thanks. Looks OK. Note that `utf8` is deprecated in 0.5, you'll need to use `Compat.UTF8String`. I've just fixed all the other deprecations on 0.5.
> `r[i] = (UInt8(chr) != 0xfffd) ? chr : ' '` Not all unicode characters will fit in an UInt8. This line above will loose all non-ascii characters from the...
Might be a good idea. Will need some thought on how to deprecate the existing behaviour. Care to do a PR?
I'd say submit a PR. We can figure out performance later. Slow code is better than no code.