RickyYXY
RickyYXY
+1 This is weird, hope the authors can give us an answer. Thks
@juancamilog I read the code again and find out that this may be because the authors have only tried the head_idxs = [0, 2, 4, 5] for vits. Since vit-b/l/g...
Thks for your reply. If using vit models with more heads, the 0,2,4,5 idx can keep the same? I'm not sure the larger model's head will get the similar attention...
> Hi, I believe your batch size is too big. Can you try something smaller than `65536`? I 'll try your advice later. Thks
But it's weird that my code can normally run without using xformers at the same batchsize, so I think it should not be the batchsize problem?
> Sorry, max bounds (from the source) are 65536, 128, 32 from the code, but it looks like you should be able to fit in that by reshaping the tensor....
@wjf5203 Waiting for your code!!!