Jianshu Zhang
Results
2
issues of
Jianshu Zhang
really curious about the similarity between the synthetic data and the raw data?
Does anyone know why the shape of outputs.attentions[0][-1] is [1, 754, 28, 28] 754 is the total number of token of inputs and current outputs, I wonder what's 28, 28...