RunTime Error of metric 'scene', when decoding with BertLMHeadModel
When I tried to evaluate a bunch of generated videos on the metric 'scene', I encountered the following problem:
File "/xxx/anaconda3/envs/vbench/lib/python3.10/site-packages/vbench/third_party/tag2Text/tag2text.py", line 192, in generate
outputs = self.text_decoder.generate(input_ids=input_ids,
File "/xxx/anaconda3/envs/vbench/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
raise AttributeError(
AttributeError: 'BertLMHeadModel' object has no attribute 'generate'
And it seems to make sense for me:
- In the function
compute_sceneofscene.py, we define model through functiontag2text_captioninthird_party/tag2Text/tag2text.py, which is linked to moduleTag2Text_Caption. - In
Tag2Text_Caption, we claim thatself.text_decoder = BertLMHeadModel(config=decoder_config), and callself.text_decoder.generateno matter whethersample=Truein functiongenerate. -
BertLMHeadModelinthird_party/tag2Text/med.pyactually does not have such functiongenerate, and its ancestorsBertPreTrainedModelandPreTrainedModelalso do not definegenerate.
Could anyone help me to solve the problem?
I meet the same problem.
@WenkunHe @Lihui-Gu Hi, may I know what version of transformers you are using?
@yinanhe I met the same problem as well. And I am using transformers==4.33.2
@yingShen-ys Hello, after testing, the transformers in version 4.33.2 can perform normal inference. You can refer to this issue https://github.com/xinyu1205/recognize-anything/issues/218.
@yingShen-ys Hello, after testing, the transformers in version 4.33.2 can perform normal inference. You can refer to this issue xinyu1205/recognize-anything#218.
@yinanhe
Thank you for the help! I also encountered another following issue, where it appears that many parameters are not properly initialized. This also happened when evaluating the metric 'scene'. Is this expected behavior? I installed vbench via pip install vbench.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel:
['bert.encoder.layer.3.attention.output.LayerNorm.bias', .... ,'bert.encoder.layer.3.attention.self.query.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized:
['bert.encoder.layer.0.crossattention.self.value.weight',..., 'bert.encoder.layer.1.crossattention.self.value.bias', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.1.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.output.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 30524. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
load checkpoint from /home/.cache/vbench/caption_model/tag2text_swin_14m.pth
probably also related to #151