ggml
ggml copied to clipboard
[Feature Request]: Meta SeamlessM4T support
With the latest update of the meta SeamlessM4T, the translator.py can detect if you have GPU or CPU. If no GPU, it will fallback to use CPU. I could run the translation for text to text using CPU, I have not tried audio to text or text to audio. The translator.py now has the checks from what I review the scripts yesterday.
m4t_predict "what's up" t2tt cmn --src_lang eng --model_name 'seamlessM4T_medium'
2023-09-03 05:21:14,576 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32.
Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True
to download again.
Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True
to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True
to download again.
2023-09-03 05:21:36,193 INFO -- m4t_scripts.predict.predict: Translated text in cmn: 有什么问题?
Reviewing this again, the change in the script is not on translator.py but in predict.py in https://github.com/facebookresearch/seamless_communication/blob/main/scripts/m4t/predict/predict.py Line 62 to 70 if torch.cuda.is_available(): device = torch.device("cuda:0") dtype = torch.float16 logger.info(f"Running inference on the GPU in {dtype}.") else: device = torch.device("cpu") dtype = torch.float32 logger.info(f"Running inference on the CPU in {dtype}.")
it detects if GPU is present if not fallback to CPU. I was able to run it on CPU for text to text translation, its single threaded on my 8 core machine. Again, I haven't try out audio to text and vice versa. But it does looks like seamlessm4t is supported on CPU without any work needed.
I did a quick test on text to audio/speech, it works on CPU too
m4t_predict "brother, where are you?" t2st ind --src_lang eng --model_name 'seamlessM4T_medium' --output_path /test.mp3
2023-09-04 04:39:20,577 INFO -- m4t_scripts.predict.predict: Running inference on the CPU in torch.float32.
Using the cached checkpoint of the model 'seamlessM4T_medium'. Set force=True
to download again.
Using the cached tokenizer of the model 'seamlessM4T_medium'. Set force=True
to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set force=True
to download again.
2023-09-04 04:40:10,159 INFO -- m4t_scripts.predict.predict: Saving translated audio in ind
2023-09-04 04:40:10,174 INFO -- m4t_scripts.predict.predict: Translated text in ind: Saudara, di mana Anda?
Meta appears to have done this themselves with the new M4Tv2 release: https://github.com/facebookresearch/seamless_communication/tree/main/ggml
@ggerganov did you know about this?
@ggerganov did you know about this?
Huh, no - very cool!
We should help with the implementation
Nice, it even works!
It would be great to bring it up-to-date with latest ggml
. Would reduce memory usage and enable GPU support, among other improvements. In any case, having a working implementation is of great help! Very cool to see this from the Meta team ❤️