Taurus

Results 3 issues of Taurus

It's a very intresting work, but I was wondering how many GPUs when you're training videoMoco? we found if use more than 24 GPUs(v100 32G), the training processing would be...

I use the AttentionSeq2Seq to fit the token normalization mission. The input is a bunch of one-hot data, and the output is a distribution on the output vocab, batch size=128....

would you mind sharing some detail metrics within training? like grad norm?