Jianshu Zhang

Results 2 issues of Jianshu Zhang

really curious about the similarity between the synthetic data and the raw data?

Does anyone know why the shape of outputs.attentions[0][-1] is [1, 754, 28, 28] 754 is the total number of token of inputs and current outputs, I wonder what's 28, 28...