lingua-go icon indicating copy to clipboard operation
lingua-go copied to clipboard

Detect multiple languages in mixed-language text

Open pemistahl opened this issue 2 years ago • 1 comments

Currently, for a given input string, only the most likely language is returned. However, if the input contains contiguous sections of multiple languages, it will be desirable to detect all of them and return an ordered sequence of items, where each item consists of a start index, an end index and the detected language.

Input: He turned around and asked: "Entschuldigen Sie, sprechen Sie Deutsch?"

Output:

[
  {"start": 0, "end": 27, "language": ENGLISH}, 
  {"start": 28, "end": 69, "language": GERMAN}
]

pemistahl avatar Jan 22 '22 18:01 pemistahl

This would be quite a useful feature!

khalilsarwari avatar Jan 25 '22 21:01 khalilsarwari