Maha Elbayad comments

Results 5 comments of


                                            Maha Elbayad

How to get the original sequence segment

Do you mean show the alignment between input and output? what modalities/task are you looking at?

Is it possible to generate subtitles like whisper ai?

Hi @barinov274! Unlike Whisper and although we both use an encoder-decoder architecture, we didn't train for ASR with timestamp tokens. Our focus is translation and ASR is treated as S2TT...

Refactoring expressivity/predict into ExpressiveTranslator.

Thank you @kauterry for drafting this PR. @yilinyang7, the main thing that I thought was missing in SC is this: ` speech_output, text_output = expressive_translator.predict(input) ` where a user wouldn't...

Refactoring expressivity/predict into ExpressiveTranslator.

> > Thank you @kauterry for drafting this PR. @yilinyang7, the main thing that I thought was missing in SC is this: `speech_output, text_output = expressive_translator.predict(input)` where a user wouldn't...

maximum length for t2tt

@maherr13 max_seq_len in the T2TT model is set to 1024 subword tokens (see [NLLB dense_1b config](https://github.com/facebookresearch/fairseq2/blob/c0107bd8a1ebfc2514a8b5f4e64725d1e05c28db/src/fairseq2/models/nllb/builder.py#L88)). That said sentence-level MT training data is usually short (on average