LAVIS
LAVIS copied to clipboard
Text tokenizer difference between foward and extract_feature
Hi,
I notice that in blip2_qformer.py
, in the forward
function, the text_tokens are truncated to max_length which is 32, while in extract_feature
function which to my understanding is an inference function , the text_tokens are not truncated, which could be much larger than in the training which is the forward
function.
May I ask why is the difference ? I especially do not understand why text token is restricted to 32 in training.
Looking forward to the answer :) Thanks