Proyag Pal

Results 2 issues of Proyag Pal

The main addition in this PR is modified decoding with specific kinds of target factors. This is designed for automatic dubbing models (https://iwslt.org/2023/dubbing), where we are dealing with numeric factors...

I'm trying to do some AutoModel pre-training and see how throughput scales from 4 GPUs to 8 GPUs and what I'm seeing in the throughput metrics has confused me, so...