dl-translate
dl-translate copied to clipboard
Library for translating between 200 languages. Built on 🤗 transformers.
Right now `TranslationModel.translate` will translate each input string as is, which can be extremely slow for longer sequences due to the quadratic runtime of the architecture. The current recommended way...
That would be nice for Kaggle/Colab/GCP users. Unfortunately I'm not too familiar with XLA so it might take a while before I take a stab at that.
The [langdetect](https://github.com/Mimino666/langdetect) has worked well for me in the past for language detection problems. How would you feel about allowing users to pass `'auto'` as an option for `source`? I...
I'm using nllb200_distilled_600M official model (using cache, not offine downloaded) running following programs: ``` python import dl_translate as dlt import nltk nltk.data.path.append(r"E:\xxx\nltk_data") mt = dlt.TranslationModel("nllb200") mt = dlt.TranslationModel("facebook/nllb-200-distilled-600M") text =...
FYI Changes in [transformers tokenizer](https://github.com/huggingface/transformers/issues/31884) gives deprecation warning. > >/xxx/dltranslate/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers...