Michael Kohler

Results 87 comments of Michael Kohler

Looks like these are indeed CC0. I don't think we need to ask legal for this. @nukeador do you agree? Would love to see a selection of these sentences. Also,...

@J-Wrobel did you have any luck getting a third reviewer?

Also, can you remove the sample file and add it somewhere online? We eventually do not want this as part of the source code here.

Thanks for your answers! > Yes, I have included a long list of less frequently occuring words in the disallowed_words/hindi.txt file. These are about 150K words in this list. Which...

@karthiksibm can you please also have a look at the other comments I've made?

These numbers are a bit too high. @nukeador I forgot what the required minimum was, can you remind me? Can you look at the sentences and see if you can...

The error rate should be between 5-7%. Anything lower of course is great, but probably very hard to achieve.

> To filter out such long words, is there a parameter to set the max_characters per word or max_trimmed_length, like the opposite of min_characters or min_trimmed_length that we have? There...

@karthiksibm it seems you can use `\\w{5,50}` in the `abbreviation_patterns` to exclude any words larger than 4 words. Adjust `5` to the maximum of characters per word that should be...

If you merge latest master into your branch, you can also use the `other_patterns` config rule to add that, then it's not so confusing as that's not really an abbreviation...