CLIP What is the limit for text sequences?

I am trying to apply CLIP on a very specific dataset and need to fine tune. I am doing fine tuning following the steps here https://github.com/openai/CLIP/issues/83.

But cannot figure out what is maximum size of the text sequence length. Can it handle large texts? Or I have to restict the text to a pre-defined limit.

Any help is appreciated.

Jun 29 '22 20:06 smith-co

According to this function, it can tokenize sequences up to the length of 77 (i.e., 77 words/tokens) and raises a runtime exception for anything beyond.

Jul 19 '22 16:07 akskuchi

The default length is 77, you might need to perform some more fine-tuning to accept larger text sequences. Below is how you can do it.

from transformers import (
    CLIPTextConfig,
    CLIPVisionConfig,
    CLIPTextModelWithProjection,
    CLIPVisionModelWithProjection
)

CLIP_CHECKPOINTS = "openai/clip-vit-base-patch32"  
PROJECTION_DIM = 512  # Replace with your desired projection dimension
padding_max_length = 100  # Replace with your desired maximum position embeddings

textConfig = CLIPTextConfig.from_pretrained(CLIP_CHECKPOINTS)
textConfig.projection_dim = PROJECTION_DIM
textConfig.max_position_embeddings = padding_max_length

visionConfig = CLIPVisionConfig.from_pretrained(CLIP_CHECKPOINTS)
visionConfig.projection_dim = PROJECTION_DIM

# Using CLIP text model with a projection head on top
clipTextModel = CLIPTextModelWithProjection.from_pretrained(
    pretrained_model_name_or_path=CLIP_CHECKPOINTS,
    config=textConfig,
    ignore_mismatched_sizes=True
)

# Using CLIP vision ViT model with a projection head on top 
clipVisionModel = CLIPVisionModelWithProjection.from_pretrained(
    pretrained_model_name_or_path=CLIP_CHECKPOINTS,
    config=visionConfig,
    ignore_mismatched_sizes=True
)

This code snippet can help.

Jul 08 '23 15:07 hrithickcodes

Hello, I would like to ask where this code should be added? Is transformers a library? Looking forward to your response

Jul 19 '23 02:07 chuyihuan

@chuyihuan If you are looking forward to fine-tuning CLIP then the snippet can be useful. yes, transformers is a library.

Jul 19 '23 11:07 hrithickcodes

CLIP CLIP copied to clipboard

What is the limit for text sequences?

CLIP
CLIP copied to clipboard