ChemRxnExtractor icon indicating copy to clipboard operation
ChemRxnExtractor copied to clipboard

Doesn't support the latest version of transformers. IndexError: tuple index out of range

Open alexey-krasnov opened this issue 1 year ago • 3 comments

Hi! I tried to install the program on Mac M2 and found the error while using fresh python3.11 and transformers 4.34.0 as well as tokenizers 0.14.1. When I ran pipeline.py I got the next error, I also printed outputs variable and its type from model.py file:

python pipeline.py --model models --input tests/data/raw.txt --output out.json

Loading product extractor from models/prod...Some weights of the model checkpoint at models/prod were not used when initializing BertCRFForTagging: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']

  • This IS expected if you are initializing BertCRFForTagging from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

  • This IS NOT expected if you are initializing BertCRFForTagging from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). done Loading role extractor from models/role...Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Some weights of the model checkpoint at models/role were not used when initializing BertCRFForRoleLabeling: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']

  • This IS expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

  • This IS NOT expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). done BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[-7.2273e-01, -1.3122e-01, -1.1214e+00, ..., 1.3058e+00, -7.9485e-01, 1.7867e+00], [-7.0862e-01, 8.6485e-01, -8.8587e-01, ..., -9.6626e-01, 1.1189e+00, 8.2820e-01], [ 4.5367e-01, -1.0321e+00, -4.2794e-02, ..., -1.1382e-01, -6.6842e-01, -2.5176e-01], ..., [ 4.6953e-01, -4.4274e-01, -2.4798e-01, ..., -7.2705e-02, 1.0744e-01, -6.4336e-01], [ 2.9490e-01, -6.9888e-01, 4.3615e-01, ..., 2.6774e-01, -2.0582e-01, -5.2261e-01], [ 3.9034e-01, -4.5327e-01, 7.8649e-02, ..., 9.6617e-02, 1.2183e-01, -6.4031e-01]],

      [[-7.0996e-01, -1.4535e-01, -1.1391e+00,  ...,  1.2885e+00,
        -7.9856e-01,  1.7955e+00],
       [-6.9886e-01,  8.5938e-01, -8.7897e-01,  ..., -9.6362e-01,
         1.1165e+00,  8.3504e-01],
       [ 4.8238e-01, -1.0400e+00, -3.7558e-02,  ..., -1.3158e-01,
        -6.7545e-01, -2.3461e-01],
       ...,
       [ 3.2453e-01, -4.9036e-01, -6.7992e-02,  ...,  2.5228e-01,
        -3.6437e-01, -5.2642e-01],
       [-1.5090e-01, -5.1699e-01,  8.8815e-01,  ..., -6.2315e-02,
        -5.7753e-02,  3.7765e-01],
       [ 2.5326e-01, -7.0302e-01,  4.8222e-01,  ...,  3.4928e-01,
        -3.8039e-01, -3.5345e-01]],
    
      [[-7.1101e-01, -1.8783e-01, -1.1474e+00,  ...,  1.3191e+00,
        -8.0083e-01,  1.8080e+00],
       [-6.7759e-01,  8.8157e-01, -8.7389e-01,  ..., -9.1561e-01,
         1.0664e+00,  8.2624e-01],
       [ 2.9864e-01, -1.1082e+00, -6.4317e-02,  ..., -4.3113e-01,
        -6.0993e-01, -3.3053e-04],
       ...,
       [ 6.3764e-01, -6.6906e-01,  6.6547e-02,  ...,  3.8974e-01,
        -1.4845e-01, -3.4640e-02],
       [ 6.1456e-01, -7.3927e-01,  2.4568e-02,  ...,  5.0147e-01,
        -1.6768e-01, -5.0775e-02],
       [ 8.0470e-01, -8.2970e-01,  3.1760e-01,  ...,  2.6904e-01,
        -2.7527e-01,  7.3508e-02]],
    
      ...,
    
      [[-6.5839e-01, -1.3844e-01, -1.1229e+00,  ...,  1.3440e+00,
        -8.8502e-01,  1.8236e+00],
       [-6.7542e-01,  8.9506e-01, -8.3126e-01,  ..., -9.7451e-01,
         1.0457e+00,  8.9183e-01],
       [ 5.9978e-01, -8.9523e-01, -4.6267e-01,  ..., -6.4620e-01,
        -1.1831e+00, -4.5417e-02],
       ...,
       [ 5.6698e-01, -8.3520e-01, -5.7488e-02,  ...,  9.3281e-02,
        -4.7615e-01, -6.3753e-01],
       [ 5.2334e-01, -8.4141e-01, -5.8037e-02,  ...,  2.9975e-01,
        -4.9307e-01, -5.2515e-01],
       [ 5.3504e-01, -8.5081e-01, -2.9094e-02,  ...,  2.7717e-01,
        -5.0541e-01, -5.6720e-01]],
    
      [[-6.4393e-01, -1.1897e-01, -1.1036e+00,  ...,  1.3327e+00,
        -8.7286e-01,  1.8048e+00],
       [-6.6012e-01,  9.5893e-01, -8.2026e-01,  ..., -9.3506e-01,
         1.0918e+00,  8.8813e-01],
       [ 2.1059e-01, -6.8708e-01, -1.2977e-01,  ..., -9.9983e-01,
        -5.8769e-01,  1.2027e-01],
       ...,
       [ 9.4669e-01, -4.1373e-01, -1.5878e-01,  ..., -5.9621e-02,
        -3.9900e-01, -2.0323e-01],
       [ 5.5884e-01, -6.9609e-01, -1.5338e-01,  ...,  4.2257e-01,
        -4.0991e-01, -2.6131e-01],
       [ 6.7532e-01, -6.0004e-01, -1.8916e-01,  ...,  2.6614e-01,
        -3.2382e-01, -1.0976e-01]],
    
      [[-6.4499e-01, -1.1385e-01, -1.1036e+00,  ...,  1.3126e+00,
        -8.9420e-01,  1.7905e+00],
       [-6.6014e-01,  9.2368e-01, -8.3718e-01,  ..., -9.3139e-01,
         1.0758e+00,  8.7325e-01],
       [ 6.0877e-01, -9.7736e-01, -2.6773e-01,  ..., -6.1211e-01,
        -8.7993e-01, -1.7766e-01],
       ...,
       [ 1.1020e+00, -2.3823e-01, -1.2828e-01,  ..., -1.8427e-02,
        -2.1446e-01,  6.2169e-02],
       [ 1.1225e+00, -1.1497e-01, -3.5708e-01,  ..., -1.2119e-01,
        -2.2195e-02,  3.4493e-01],
       [-1.9072e-01, -5.9194e-01,  9.7846e-01,  ...,  7.0911e-02,
        -2.7881e-01,  5.8151e-01]]]), pooler_output=None, hidden_states=None, past_key_values=None, attentions=None, cross_attentions=None)
    

<class 'transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions'> Traceback (most recent call last): File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/pipeline.py", line 22, in rxns = rxn_extractor.get_reactions(sents[:10]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/chemrxnextractor/cre.py", line 224, in get_reactions outputs = self.role_extractor( ^^^^^^^^^^^^^^^^^^^^ File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/chemrxnextractor/models/model.py", line 321, in forward extended_cls_h = outputs[1].unsqueeze(1).expand(batch_size, seq_length, hidden_dim) # FIXME: here is the errer causing IndexError: tuple index out of range in ~~~~~~~^^^ File "/Users/alekseikrasov/miniforge3/envs/ontochem/lib/python3.11/site-packages/transformers/utils/generic.py", line 405, in getitem return self.to_tuple()[k] ~~~~~~~~~~~~~~~^^^ IndexError: tuple index out of range

I tried to use the old version of transformers v.3.0.2, however, I cannot compile tokenizers==0.8.1.rc1 (from transformers==3.0.2) with existing Rust compiler. The same problem was on Linux Opensuse machine.

Could you please help to solve this problem and maybe update the code and requirements so that one could use the latest version of Python, transformers, and tokenizers?

alexey-krasnov avatar Oct 17 '23 09:10 alexey-krasnov

I have met the same problems. Could there is a solution?

LingjieBao1998 avatar Oct 31 '23 19:10 LingjieBao1998

Unfortunately, the installation on Mac with ARM architecture failed, however, I managed to install and run it on a Linux machine with the required dependencies. The only thing I could recommend is using Python<3.9 on Linux with the rest of the installation instructions.

alexey-krasnov avatar Nov 07 '23 12:11 alexey-krasnov

Hi @LingjieBao1998, I have found the solution for Mac users with Apple silicon. You can follow next instruction to install ChemRxnExtractor:

1. Install compatible conda with either Miniforge or Anaconda(Miniconda). We recommend using Miniforge.

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh bash Miniforge3-MacOSX-arm64 and follow instructions.

if you have installed Anaconda(Miniconda) update it to the latest version: update the conda package manager to the latest version conda update conda use conda to update Anaconda to the latest version conda update anaconda check the version of Anaconda. The 2022.05 release of Anaconda Distribution features native compiling for Apple M1’s ARM64 architecture. Set conda-forge as prior channel: conda config --add channels conda-forge conda config --set channel_priority strict

2. Create environment and install essential libraries.

conda create --name ENV_NAME "python<3.12" conda activate ENV_NAME pip install install pyproject-toml torch tqdm numpy seqeval -U

It’s important to use channel conda-forge for installation next version of tokenizers and transformed. Find the version of tokenizers according to your Python version: conda search tokenizers E.g. works fine:

  • for Python 3.9 tokenizers=0.10.1 transformers=3.0.2
  • for Python 3.11 tokenizers=[0.13.1, 0.13.2] and transformers=[3.0.2, 3.1.0]

Install tokenizers then transformers: conda install -c conda-forge tokenizers=0.13.2 conda install -c conda-forge transformers=3.1.0

3. Install ChemRxnExtractor

git clone https://github.com/jiangfeng1124/ChemRxnExtractor cd ChemRxnExtractor pip install -e .

4. If the error occurs:

line XXX, in init BertWordPieceTokenizer( TypeError: init() got an unexpected keyword argument 'vocab_file'

Please, go to the file:

/Users/USER_NAME/miniforge3/envs/ENV_NAME/lib/python3.YY/site-packages/transformers/tokenization_bert.py

and in line XXX change 'vocab_file' to 'vocab'

alexey-krasnov avatar Nov 07 '23 14:11 alexey-krasnov