7AtAri

Results 1 comments of 7AtAri

I used the output_hidden_states=True, then you can access the text embeddings using output.hidden_states. You get the image embeddings by calling the vision encoder. At least this is what I figured...