ggml icon indicating copy to clipboard operation
ggml copied to clipboard

[Feature Request]: Meta SeamlessM4T support

Open zhongwei opened this issue 1 year ago • 7 comments

zhongwei avatar Aug 23 '23 05:08 zhongwei

With the latest update of the meta SeamlessM4T, the translator.py can detect if you have GPU or CPU. If no GPU, it will fallback to use CPU. I could run the translation for text to text using CPU, I have not tried audio to text or text to audio. The translator.py now has the checks from what I review the scripts yesterday.

m4t_predict "what's up" t2tt cmn --src_lang eng --model_name 'seamlessM4T_medium' 2023-09-03 05:21:14,576 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32. Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again. 2023-09-03 05:21:36,193 INFO -- m4t_scripts.predict.predict: Translated text in cmn: 有什么问题?

maxng07 avatar Sep 03 '23 05:09 maxng07

Reviewing this again, the change in the script is not on translator.py but in predict.py in https://github.com/facebookresearch/seamless_communication/blob/main/scripts/m4t/predict/predict.py Line 62 to 70 if torch.cuda.is_available(): device = torch.device("cuda:0") dtype = torch.float16 logger.info(f"Running inference on the GPU in {dtype}.") else: device = torch.device("cpu") dtype = torch.float32 logger.info(f"Running inference on the CPU in {dtype}.")

it detects if GPU is present if not fallback to CPU. I was able to run it on CPU for text to text translation, its single threaded on my 8 core machine. Again, I haven't try out audio to text and vice versa. But it does looks like seamlessm4t is supported on CPU without any work needed.

maxng07 avatar Sep 04 '23 03:09 maxng07

I did a quick test on text to audio/speech, it works on CPU too

m4t_predict "brother, where are you?" t2st ind --src_lang eng --model_name 'seamlessM4T_medium' --output_path /test.mp3 2023-09-04 04:39:20,577 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32. Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True to download again. Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True to download again. 2023-09-04 04:40:10,159 INFO -- m4t_scripts.predict.predict: Saving translated audio in ind 2023-09-04 04:40:10,174 INFO -- m4t_scripts.predict.predict: Translated text in ind: Saudara, di mana Anda?

maxng07 avatar Sep 04 '23 04:09 maxng07

Meta appears to have done this themselves with the new M4Tv2 release: https://github.com/facebookresearch/seamless_communication/tree/main/ggml

bakkot avatar Dec 01 '23 22:12 bakkot

@ggerganov did you know about this?

Green-Sky avatar Dec 01 '23 23:12 Green-Sky

@ggerganov did you know about this?

Huh, no - very cool!

We should help with the implementation

ggerganov avatar Dec 02 '23 06:12 ggerganov

Nice, it even works!

image

It would be great to bring it up-to-date with latest ggml. Would reduce memory usage and enable GPU support, among other improvements. In any case, having a working implementation is of great help! Very cool to see this from the Meta team ❤️

ggerganov avatar Dec 02 '23 07:12 ggerganov