ILF-for-code-generation icon indicating copy to clipboard operation
ILF-for-code-generation copied to clipboard

failed to load model through jaxformer

Open damengdameng opened this issue 1 year ago • 0 comments

First of all, thank you for your excellent work and effort! I have some questions regarding running this project, and I hope you can help me out.

I followed the README to create the environment, and I am confident that the environment meets all the requirements of this project and the jaxformer project. However, when I tried to load the pretrained Codegen-6B-mono model with jaxformer.hf.codegen.modeling_codegen.CodeGenForCausalLM.from_pretrained like this:

from jaxformer.hf import sample
from jaxformer.hf.codegen import modeling_codegen

model = modeling_codegen.CodeGenForCausalLM.from_pretrained("codegen-6B-mono",low_cpu_mem_usage=True)

I encountered the following error:

Traceback (most recent call last):
  File "jaxtest.py", line 16, in <module>
    from jaxformer.hf import sample
  File "/xxxx/ILF-for-code-generation-main/src/jaxformer/jaxformer/hf/sample.py", line 29, in <module>
    from jaxformer.hf.codegen.modeling_codegen import CodeGenForCausalLM
  File "/xxxx/ILF-for-code-generation-main/src/jaxformer/jaxformer/hf/codegen/modeling_codegen.py", line 27, in <module>
    from transformers.utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
ImportError: cannot import name 'add_code_sample_docstrings' from 'transformers.utils' (/xxxx/llm_env/ilf/lib/python3.7/site-packages/transformers/utils/__init__.py)

According to this issue[https://github.com/salesforce/jaxformer/pull/30], I bumped transformers from 4.12.5 to 4.30.0. Then I got the following message:

Some weights of the model checkpoint at /xxxx/ILF-for-code-generation-main/checkpoints/codegen-6B-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.15.attn.masked_bias', 'transformer.h.30.attn.bias', 'transformer.h.17.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.18.attn.bias', 'transformer.h.13.attn.masked_bias', 'transformer.h.12.attn.masked_bias', 'transformer.h.23.attn.bias', 'transformer.h.28.attn.bias', 'transformer.h.26.attn.masked_bias', 'transformer.h.9.attn.bias', 'transformer.h.15.attn.bias', 'transformer.h.21.attn.bias', 'transformer.h.19.attn.masked_bias', 'transformer.h.19.attn.bias', 'transformer.h.8.attn.bias', 'transformer.h.21.attn.masked_bias', 'transformer.h.30.attn.masked_bias', 'transformer.h.1.attn.bias', 'transformer.h.29.attn.bias', 'transformer.h.25.attn.bias', 'transformer.h.25.attn.masked_bias', 'transformer.h.22.attn.bias', 'transformer.h.17.attn.bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.31.attn.bias', 'transformer.h.13.attn.bias', 'transformer.h.14.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.2.attn.bias', 'transformer.h.6.attn.bias', 'transformer.h.20.attn.bias', 'transformer.h.4.attn.bias', 'transformer.h.26.attn.bias', 'transformer.h.0.attn.bias', 'transformer.h.27.attn.bias', 'transformer.h.20.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.bias', 'transformer.h.10.attn.bias', 'transformer.h.31.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.16.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.27.attn.masked_bias', 'transformer.h.24.attn.bias', 'transformer.h.29.attn.masked_bias', 'transformer.h.32.attn.bias', 'transformer.h.11.attn.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.16.attn.bias', 'transformer.h.3.attn.bias', 'transformer.h.28.attn.masked_bias', 'transformer.h.23.attn.masked_bias', 'transformer.h.7.attn.bias', 'transformer.h.32.attn.masked_bias', 'transformer.h.24.attn.masked_bias', 'transformer.h.12.attn.bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.14.attn.bias', 'transformer.h.22.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.18.attn.masked_bias']
- This IS expected if you are initializing CodeGenForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CodeGenForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CodeGenForCausalLM were not initialized from the model checkpoint at /DATA/disk1/wanming/ILF-for-code-generation-main/checkpoints/codegen-6B-mono and are newly initialized: ['transformer.h.32.attn.causal_mask', 'transformer.h.24.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.27.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.25.attn.causal_mask', 'transformer.h.22.attn.causal_mask', 'transformer.h.29.attn.causal_mask', 'transformer.h.30.attn.causal_mask', 'transformer.h.21.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.0.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.26.attn.causal_mask', 'transformer.h.23.attn.causal_mask', 'transformer.h.31.attn.causal_mask', 'transformer.h.3.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.5.attn.causal_mask', 'transformer.h.8.attn.causal_mask', 'transformer.h.4.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.9.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.6.attn.causal_mask', 'transformer.h.28.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.20.attn.causal_mask', 'transformer.h.7.attn.causal_mask', 'transformer.h.19.attn.causal_mask']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

It seems like the model did not load correctly. Could you please help me identify what went wrong in the process?

damengdameng avatar Jul 11 '23 09:07 damengdameng