datashare icon indicating copy to clipboard operation
datashare copied to clipboard

Force specific language for NLP

Open MatiasConTilde opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. I know for a fact that all my documents are in the same specific language, but when running NLP on them, it auto-detects different ones, which I guess impacts the quality of the results.

Describe the solution you'd like It would be useful to configure a specific language for NLP instead of auto-detecting it.

Describe alternatives you've considered A wokraround hack that I thought of is to replace the NLP model files of other languages with symlinks to the one we want, so it still thinks it's a different language but when loading the model it actually loads the correct one, but I haven't tried this as I fear it could bring other consequences.

MatiasConTilde avatar Sep 23 '20 14:09 MatiasConTilde

Regarding language detection, another issue was opened: https://github.com/ICIJ/datashare/issues/781

Soliine avatar Mar 31 '21 08:03 Soliine

Closed in favor of #938

mvanzalu avatar Sep 13 '22 08:09 mvanzalu