lingua Detection of long texts is not running parallelized

Detection of long texts is not running parallelized

Open Marcono1234 opened this issue 2 years ago • 0 comments

Detection of long texts (or usage of withLowAccuracyMode()) only uses a single worker thread for language detection.

The reason for this is that a work task per ngram length is submitted. However, for long texts and when using withLowAccuracyMode() only the ngram length 3 is checked. Therefore only a single work task is submitted. One solution might be to perform the per language computation in computeLanguageProbabilities each as a separate work task; however, that approach will probably only be worth it if the input is long enough (have not verify this).

Aug 01 '22 15:08 Marcono1234

lingua lingua copied to clipboard

Detection of long texts is not running parallelized

lingua
lingua copied to clipboard