LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Query about the dimension of outputs.attentions
Does anyone know why the shape of outputs.attentions[0][-1] is [1, 754, 28, 28]
754 is the total number of token of inputs and current outputs,
I wonder what's 28, 28 here for?