editdojo icon indicating copy to clipboard operation
editdojo copied to clipboard

Added function for detecting message lang

Open emills11 opened this issue 7 years ago • 2 comments

I went ahead and made a basic function for detecting the language of a message, in order to identify it as either being typed in the user's target language (so it can be seen by other users) or native language (so it can be ignored).

I did run into an issue concerning the langdetect library; due to the nature of the library's probability-based algorithm, it will occasionally misidentify a message's language if the message contains spelling errors. For example, "Hello World!" will return English, while "Helo Woorld!" will return Dutch. I could use some help coming up with a solution for this problem.

emills11 avatar Nov 16 '18 05:11 emills11

I may have found a possible solution to the above problem by iterating through the Language objects that are returned when calling detect_langs(), and checking to see if any of the probable languages match either the user's target language or native language. Will push a second commit when I get home.

emills11 avatar Nov 16 '18 18:11 emills11

Thank you. I'll take a look at this after I publish my next video about #22.

ykdojo avatar Nov 16 '18 23:11 ykdojo