dl-translate
dl-translate copied to clipboard
[BUG]: nllb200_distilled_600M official not running properly
I'm using nllb200_distilled_600M official model (using cache, not offine downloaded) running following programs:
import dl_translate as dlt
import nltk
nltk.data.path.append(r"E:\xxx\nltk_data")
mt = dlt.TranslationModel("nllb200")
mt = dlt.TranslationModel("facebook/nllb-200-distilled-600M")
text = "This paper presents a literature survey on existing disparity map algorithms. It focuses on four main stages of processing as proposed by Scharstein and Szeliski in a taxonomy and evaluation of dense two-frame stereo correspondence algorithms performed in 2002. To assist future researchers in developing their own stereo matching algorithms, a summary of the existing algorithms developed for every stage of processing is also provided. The survey also notes the implementation of previous software-based and hardware-based algorithms. Generally, the main processing module for a software-based implementation uses only a central processing unit. By contrast, a hardware-based implementation requires one or more additional processors for its processing module, such as graphical processing unit or a field programmable gate array. This literature survey also presents a method of qualitative measurement that is widely used by researchers in the area of stereo vision disparity mappings. "
sents = nltk.tokenize.sent_tokenize(text, "english")
print("".join(mt.translate(sents, source="eng_Latn", target="zho_Hans")))
Bugs occur:
$ File "E:\xxx\translation.py", line 16, in <module>
$ print("".join(mt.translate(sents, source=dlt.lang.ENGLISH, target="zho_Hans")))
$ File "E:\xxx\Anaconda3\envs\DLTranslation\lib\site-packages\dl_translate\_translation_model.py", line 173, in translate
$ "forced_bos_token_id", self._tokenizer.lang_code_to_id[target]
$ AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'
But when i using other model like m2m100, there is no problem, really need help!!!