bkuster

Results 2 comments of bkuster

(this is speculation/my understanding, not 100% accurate answer) 1) The "pretrain_mlp_adapter" is the file for the multi-layer perceptron weights. (the output tokens of the CLIP encoder are converted into "visual"...

As a hack, you can try "merging" several images into 1 image, but you'd probably have to finetune to model a bit.