FaRL Text encoder

Text encoder

Open renderless opened this issue 2 years ago • 1 comments

Thank you for your awesome work. Do you have plan to release pretrained text encoder?

May 27 '22 10:05 renderless

Hi, thanks for your attention. The pretrained backbones we released contain the weights of the text encoder. In fact, you can load the weights of FaRL using exactly the same network structure as CLIP VIT-B16, and use it exactly like CLIP. Here I show the example modified from CLIP.

import torch
import clip
from PIL import Image

device ="cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/16", device="cpu")
model = model.to(device)
farl_state=torch.load("FaRL-Base-Patch16-LAIONFace20M-ep16.pth") # you can download from https://github.com/FacePerceiver/FaRL#pre-trained-backbones
model.load_state_dict(farl_state["state_dict"],strict=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

Jun 21 '22 07:06 yinglinzheng

FaRL FaRL copied to clipboard

Text encoder

FaRL
FaRL copied to clipboard