Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Need Help in understanding hidden_states of Computer vision models

Open kanlions opened this issue 2 years ago • 0 comments

I am having trouble in interpreting the hidden_state and last_hidden_state indexing with respect to transformer models for computer vision

which layer output is the last_hidden state. Example in a swin transformer tiny the hidden_state returns a tuple of 5 with sizes 3136x96, 784x192, 196x38, 49x768 and 49x768 respectively. I tried to view them but I was not able to get the last_hidden_state from the tuples of hidden_state. Similar problem I faced in VIT models too Please help in understanding these embeddings from Model output class specially for transformers of computer vision as I am trying to find some interpretibility from the model outputs. Also the index of the optional initial embedding outputs is confusing.

Thanks in advanced

kanlions avatar Jul 21 '22 13:07 kanlions