VBench icon indicating copy to clipboard operation
VBench copied to clipboard

RunTime Error of metric 'scene', when decoding with BertLMHeadModel

Open WenkunHe opened this issue 6 months ago • 5 comments

When I tried to evaluate a bunch of generated videos on the metric 'scene', I encountered the following problem:

File "/xxx/anaconda3/envs/vbench/lib/python3.10/site-packages/vbench/third_party/tag2Text/tag2text.py", line 192, in generate outputs = self.text_decoder.generate(input_ids=input_ids, File "/xxx/anaconda3/envs/vbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__ raise AttributeError( AttributeError: 'BertLMHeadModel' object has no attribute 'generate'

And it seems to make sense for me:

  1. In the function compute_scene of scene.py, we define model through function tag2text_caption in third_party/tag2Text/tag2text.py, which is linked to module Tag2Text_Caption.
  2. In Tag2Text_Caption, we claim that self.text_decoder = BertLMHeadModel(config=decoder_config), and call self.text_decoder.generate no matter whether sample=True in function generate.
  3. BertLMHeadModel in third_party/tag2Text/med.py actually does not have such function generate, and its ancestors BertPreTrainedModel and PreTrainedModel also do not define generate.

Could anyone help me to solve the problem?

WenkunHe avatar Jun 24 '25 21:06 WenkunHe

I meet the same problem.

Lihui-Gu avatar Aug 15 '25 02:08 Lihui-Gu

@WenkunHe @Lihui-Gu Hi, may I know what version of transformers you are using?

yinanhe avatar Aug 15 '25 03:08 yinanhe

@yinanhe I met the same problem as well. And I am using transformers==4.33.2

yingShen-ys avatar Nov 06 '25 01:11 yingShen-ys

@yingShen-ys Hello, after testing, the transformers in version 4.33.2 can perform normal inference. You can refer to this issue https://github.com/xinyu1205/recognize-anything/issues/218.

yinanhe avatar Nov 06 '25 03:11 yinanhe

@yingShen-ys Hello, after testing, the transformers in version 4.33.2 can perform normal inference. You can refer to this issue xinyu1205/recognize-anything#218.

@yinanhe Thank you for the help! I also encountered another following issue, where it appears that many parameters are not properly initialized. This also happened when evaluating the metric 'scene'. Is this expected behavior? I installed vbench via pip install vbench.


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: 
['bert.encoder.layer.3.attention.output.LayerNorm.bias', .... ,'bert.encoder.layer.3.attention.self.query.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Some weights of BertModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: 
['bert.encoder.layer.0.crossattention.self.value.weight',..., 'bert.encoder.layer.1.crossattention.self.value.bias', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.1.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.output.dense.bias']

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 30524. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

load checkpoint from /home/.cache/vbench/caption_model/tag2text_swin_14m.pth

probably also related to #151

yingShen-ys avatar Nov 06 '25 05:11 yingShen-ys