flamingo-pytorch icon indicating copy to clipboard operation
flamingo-pytorch copied to clipboard

the pretrained model weights

Open runzeer opened this issue 2 years ago • 9 comments

any pretrained model weights released?

runzeer avatar May 07 '22 02:05 runzeer

No. But maybe it's possible to make it work with the pretrained OPT weights recently released by facebook and the pretrained CLIP weights.

sharifza avatar May 07 '22 18:05 sharifza

No. But maybe it's possible to make it work with the pretrained OPT weights recently released by facebook and the pretrained CLIP weights.

great idea :) will still need to train the cross attention blocks (perceiver + gated), but should be doable

lucidrains avatar May 08 '22 00:05 lucidrains

It would really be great to somehow allow loading pretrained weights for one (some?) of the available pretrained models.

Do you think it's really difficult (and unintuitive) to smh allow pulling weights from arbitrary pretrained models (to an extent, smth like GPTJ, GPTNeo(X), GPT2, etc.)?

TheodoreGalanos avatar May 08 '22 05:05 TheodoreGalanos

Are we planning to release any pretrained model weights in the future?

LITDataScience avatar May 28 '22 06:05 LITDataScience

Its actually not a good idea since CLIP is not task agnostic, as explained in CLIP's paper tasks with non natural images like EuroSAT perform poorly. So you would be basically making Flamingo look like a model being one year older. " Flamingo achieves state-of-the-art performance across a wide range of benchmarks without training on commonly used and curated datasets such as VQAv2, COCO or ImageNet. Instead, Flamingo is trained solely on task-agnostic web scraped data."

edmondja avatar Sep 10 '22 12:09 edmondja

Its actually not a good idea since CLIP is not task agnostic, as explained in CLIP's paper tasks with non natural images like EuroSAT perform poorly. So you would be basically making Flamingo look like a model being one year older. " Flamingo achieves state-of-the-art performance across a wide range of benchmarks without training on commonly used and curated datasets such as VQAv2, COCO or ImageNet. Instead, Flamingo is trained solely on task-agnostic web scraped data."

Flamingo's encoder backbone is trained in a similar approach to CLIP with contrastive text-image training (ref to Section 3 of the paper). The data from CLIP is also scraped from the web, I think in a very similar way to Flamingo. Therefore, the training process of Flamingo or the data that it uses is not necessarily more task agnostic than CLIP.

If you want to have a proper task agnostic backbone, it's probably better to use a backbone trained in a self-supervised approach similar to BYOL. Nevertheless, the type of data is important. For example, you mentioned X-ray in the other thread, which could be quite different than typically scraped data from the web.

sharifza avatar Sep 10 '22 15:09 sharifza

I tried this but i counted on flamingo to help me to do few short learning because my dataset is not a per se real dataset as we can imagine it but more few examples of xray images

edmondja avatar Sep 10 '22 16:09 edmondja

You might be able to use Flamingo for your use case by prompting (I'm not sure if it works but its worth a try). By prompting I mean that u give it one x-ray image, describe what's important in the image, give it a second image and ask your query.

I think this leads us out of the topic here, if you want we can discuss this in another thread.

sharifza avatar Sep 10 '22 18:09 sharifza

Any news on the model weights? Thanks.

Ellyuca avatar Mar 20 '23 13:03 Ellyuca