dl-translate issues

Support for sentence splitting

3

Right now `TranslationModel.translate` will translate each input string as is, which can be extremely slow for longer sequences due to the quadratic runtime of the architecture. The current recommended way...

xhluca

enhancement

help wanted

Add a command line interface

xhluca

enhancement

help wanted

Add support for TPU

That would be nice for Kaggle/Colab/GCP users. Unfortunately I'm not too familiar with XLA so it might take a while before I take a stab at that.

xhluca

enhancement

help wanted

Detect source language with langdetect package

5

The [langdetect](https://github.com/Mimino666/langdetect) has worked well for me in the past for language detection problems. How would you feel about allowing users to pass `'auto'` as an option for `source`? I...

awalker88

enhancement

help wanted

good first issue

[BUG]: nllb200_distilled_600M official not running properly

I'm using nllb200_distilled_600M official model (using cache, not offine downloaded) running following programs: ``` python import dl_translate as dlt import nltk nltk.data.path.append(r"E:\xxx\nltk_data") mt = dlt.TranslationModel("nllb200") mt = dlt.TranslationModel("facebook/nllb-200-distilled-600M") text =...

Jason-JP-Yang

Tokenizer clean_up_tokenization_spaces

5

FYI Changes in [transformers tokenizer](https://github.com/huggingface/transformers/issues/31884) gives deprecation warning. > >/xxx/dltranslate/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers...

Nickwiz

dl-translate
dl-translate copied to clipboard

Metadata

Support for sentence splitting

Add a command line interface

Add support for TPU

Detect source language with langdetect package

[BUG]: nllb200_distilled_600M official not running properly

Tokenizer clean_up_tokenization_spaces

← Metadata

Owner

Metadata

dl-translate dl-translate copied to clipboard

Metadata

Support for sentence splitting

Add a command line interface

Add support for TPU

Detect source language with langdetect package

[BUG]: nllb200_distilled_600M official not running properly

Tokenizer clean_up_tokenization_spaces

← Metadata

Owner

Metadata

dl-translate
dl-translate copied to clipboard