lingua
lingua copied to clipboard
Detection of long texts is not running parallelized
Detection of long texts (or usage of withLowAccuracyMode()
) only uses a single worker thread for language detection.
The reason for this is that a work task per ngram length is submitted. However, for long texts and when using withLowAccuracyMode()
only the ngram length 3 is checked. Therefore only a single work task is submitted. One solution might be to perform the per language computation in computeLanguageProbabilities
each as a separate work task; however, that approach will probably only be worth it if the input is long enough (have not verify this).