Michael Mai
Michael Mai
Features become all Nan after self.rnn layer, and loss is nan. 
Hi, I have checked the Clip-Vision embedding (last hidden state) of Blip2&InstructBlip on huggingface (instructblip-vicuna-7b), the dimension is 257x1408. However, the multi-modal matching space of ViT-Lens uses 1x768 dimension. I...
In this table, it seems that DREAM used the evaluation performance of Subject-1, while other papers (like MindEye) used the average performance across 4 subjects, is that fair? 