Phong Bui-Khanh

Results 1 issues of Phong Bui-Khanh

How can I determine which region of the image the model is focusing on when answering a specific question?, Does InternVL use Cross-Attention between images and text? If so, how...