seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

SeamlessM4T_large Model Produces Gibberish Output in Colab

Open pratikshappai opened this issue 2 years ago • 8 comments

Description

When running the SeamlessM4T_large model in a Colab notebook, the output becomes repetitive and gibberish. This issue is not present when using the same model in Hugging Face Spaces.

Steps to Reproduce

  1. Run the SeamlessM4T_large model in a Colab notebook.
  2. Feed an audio chunk of approximately 14 seconds into the model.

Expected Result

The model should produce a coherent output.

Actual Result

The model outputs gibberish and falls into a repetitive loop, repeating the same few phrases until the end of the audio.

Additional Info

pratikshappai avatar Aug 27 '23 20:08 pratikshappai

Could you try setting --ngram-filtering to True: https://github.com/facebookresearch/seamless_communication/blob/main/scripts/m4t/predict/predict.py#L50-L55

kauterry avatar Aug 28 '23 15:08 kauterry

I attempted to update the ngram_filtering parameter to True in the Translator.predict method. Despite making changes in the source code and reinstalling the package, the changes don't seem to reflect.

Steps to Reproduce

  1. Modified ngram_filtering in predict.py and the argument parser.
    parser.add_argument("--ngram-filtering", type=bool, default=True)
    
    translated_text, wav, sr = translator.predict(
        args.input, args.task, args.tgt_lang, src_lang=args.src_lang, ngram_filtering=True
    )
    
  2. Ran pip install --upgrade --force-reinstall .
  3. Reloaded the inference module.
    from importlib import reload
    import seamless_communication.models.inference
    reload(seamless_communication.models.inference)
    

Expected Behavior

help(Translator.predict) should reflect the change (ngram_filtering set to True).

Actual Behavior

help(Translator.predict) still shows ngram_filtering as False. The problem persists :/

pratikshappai avatar Aug 28 '23 19:08 pratikshappai

Dear @pratiksha-pai and @kauterry I'm facing this problem receiving lower performances than huggingface-space, despite the fact that in both cases the SeamlessM4T-Large model is being called! I also tried turning the ngram_filtering to True but it didn't work!

Please inform me in case there is a solution. Thank you in advance.

Arash Dehghani

arash-aut avatar Aug 30 '23 11:08 arash-aut

Hi @kauterry, I wanted to follow up on this issue. Could you please let me know if I need to fix anything in particular? Thanks so much for the help until this point!

pratikshappai avatar Sep 06 '23 02:09 pratikshappai

Hi @pratiksha-pai I think this issue no longer exists! I tested the model for Persian language and the results are the same as Demo. Could you try inferring the model once again to see if anything is changed?

arash-aut avatar Sep 11 '23 06:09 arash-aut

Oh, let me give this another try then, thanks for letting me know.

pratikshappai avatar Sep 12 '23 14:09 pratikshappai

I get the same issue. Poor results. Gibberish sentences repeating in loop. Any suggestions ?

projects-g avatar Sep 15 '23 15:09 projects-g

Same issue, chinese to english T2TT

aliencaocao avatar Mar 01 '24 17:03 aliencaocao