haystack icon indicating copy to clipboard operation
haystack copied to clipboard

[feature request] Chinese support

Open so2liu opened this issue 1 year ago • 1 comments

this is the list of supported languages. https://github.com/deepset-ai/haystack/blob/3e6def7e03097021c8efd1b5c277bec6e541c162/haystack/preprocessor/preprocessor.py#L17

Chinese is missing

so2liu avatar Nov 24 '23 15:11 so2liu

@so2liu Thank you for bringing up this feature request. Let's see whether somebody in our open source community with expertise in Chinese language wants to contribute here. the line of code where we run iso639_to_nltk.get(language, language) shouldn't cause a big problem though. The dict iso639_to_nltk doesn't contain cn or chinese but if the key is not found, .get(language, language) will return language. Have you tried setting language to chinese?

julian-risch avatar Dec 05 '23 13:12 julian-risch