ChemRxnExtractor
ChemRxnExtractor copied to clipboard
Doesn't support the latest version of transformers. IndexError: tuple index out of range
Hi! I tried to install the program on Mac M2 and found the error while using fresh python3.11 and transformers 4.34.0 as well as tokenizers 0.14.1. When I ran pipeline.py I got the next error, I also printed outputs variable and its type from model.py file:
python pipeline.py --model models --input tests/data/raw.txt --output out.json
Loading product extractor from models/prod...Some weights of the model checkpoint at models/prod were not used when initializing BertCRFForTagging: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
-
This IS expected if you are initializing BertCRFForTagging from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
-
This IS NOT expected if you are initializing BertCRFForTagging from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). done Loading role extractor from models/role...Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Some weights of the model checkpoint at models/role were not used when initializing BertCRFForRoleLabeling: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
-
This IS expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
-
This IS NOT expected if you are initializing BertCRFForRoleLabeling from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). done BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[-7.2273e-01, -1.3122e-01, -1.1214e+00, ..., 1.3058e+00, -7.9485e-01, 1.7867e+00], [-7.0862e-01, 8.6485e-01, -8.8587e-01, ..., -9.6626e-01, 1.1189e+00, 8.2820e-01], [ 4.5367e-01, -1.0321e+00, -4.2794e-02, ..., -1.1382e-01, -6.6842e-01, -2.5176e-01], ..., [ 4.6953e-01, -4.4274e-01, -2.4798e-01, ..., -7.2705e-02, 1.0744e-01, -6.4336e-01], [ 2.9490e-01, -6.9888e-01, 4.3615e-01, ..., 2.6774e-01, -2.0582e-01, -5.2261e-01], [ 3.9034e-01, -4.5327e-01, 7.8649e-02, ..., 9.6617e-02, 1.2183e-01, -6.4031e-01]],
[[-7.0996e-01, -1.4535e-01, -1.1391e+00, ..., 1.2885e+00, -7.9856e-01, 1.7955e+00], [-6.9886e-01, 8.5938e-01, -8.7897e-01, ..., -9.6362e-01, 1.1165e+00, 8.3504e-01], [ 4.8238e-01, -1.0400e+00, -3.7558e-02, ..., -1.3158e-01, -6.7545e-01, -2.3461e-01], ..., [ 3.2453e-01, -4.9036e-01, -6.7992e-02, ..., 2.5228e-01, -3.6437e-01, -5.2642e-01], [-1.5090e-01, -5.1699e-01, 8.8815e-01, ..., -6.2315e-02, -5.7753e-02, 3.7765e-01], [ 2.5326e-01, -7.0302e-01, 4.8222e-01, ..., 3.4928e-01, -3.8039e-01, -3.5345e-01]], [[-7.1101e-01, -1.8783e-01, -1.1474e+00, ..., 1.3191e+00, -8.0083e-01, 1.8080e+00], [-6.7759e-01, 8.8157e-01, -8.7389e-01, ..., -9.1561e-01, 1.0664e+00, 8.2624e-01], [ 2.9864e-01, -1.1082e+00, -6.4317e-02, ..., -4.3113e-01, -6.0993e-01, -3.3053e-04], ..., [ 6.3764e-01, -6.6906e-01, 6.6547e-02, ..., 3.8974e-01, -1.4845e-01, -3.4640e-02], [ 6.1456e-01, -7.3927e-01, 2.4568e-02, ..., 5.0147e-01, -1.6768e-01, -5.0775e-02], [ 8.0470e-01, -8.2970e-01, 3.1760e-01, ..., 2.6904e-01, -2.7527e-01, 7.3508e-02]], ..., [[-6.5839e-01, -1.3844e-01, -1.1229e+00, ..., 1.3440e+00, -8.8502e-01, 1.8236e+00], [-6.7542e-01, 8.9506e-01, -8.3126e-01, ..., -9.7451e-01, 1.0457e+00, 8.9183e-01], [ 5.9978e-01, -8.9523e-01, -4.6267e-01, ..., -6.4620e-01, -1.1831e+00, -4.5417e-02], ..., [ 5.6698e-01, -8.3520e-01, -5.7488e-02, ..., 9.3281e-02, -4.7615e-01, -6.3753e-01], [ 5.2334e-01, -8.4141e-01, -5.8037e-02, ..., 2.9975e-01, -4.9307e-01, -5.2515e-01], [ 5.3504e-01, -8.5081e-01, -2.9094e-02, ..., 2.7717e-01, -5.0541e-01, -5.6720e-01]], [[-6.4393e-01, -1.1897e-01, -1.1036e+00, ..., 1.3327e+00, -8.7286e-01, 1.8048e+00], [-6.6012e-01, 9.5893e-01, -8.2026e-01, ..., -9.3506e-01, 1.0918e+00, 8.8813e-01], [ 2.1059e-01, -6.8708e-01, -1.2977e-01, ..., -9.9983e-01, -5.8769e-01, 1.2027e-01], ..., [ 9.4669e-01, -4.1373e-01, -1.5878e-01, ..., -5.9621e-02, -3.9900e-01, -2.0323e-01], [ 5.5884e-01, -6.9609e-01, -1.5338e-01, ..., 4.2257e-01, -4.0991e-01, -2.6131e-01], [ 6.7532e-01, -6.0004e-01, -1.8916e-01, ..., 2.6614e-01, -3.2382e-01, -1.0976e-01]], [[-6.4499e-01, -1.1385e-01, -1.1036e+00, ..., 1.3126e+00, -8.9420e-01, 1.7905e+00], [-6.6014e-01, 9.2368e-01, -8.3718e-01, ..., -9.3139e-01, 1.0758e+00, 8.7325e-01], [ 6.0877e-01, -9.7736e-01, -2.6773e-01, ..., -6.1211e-01, -8.7993e-01, -1.7766e-01], ..., [ 1.1020e+00, -2.3823e-01, -1.2828e-01, ..., -1.8427e-02, -2.1446e-01, 6.2169e-02], [ 1.1225e+00, -1.1497e-01, -3.5708e-01, ..., -1.2119e-01, -2.2195e-02, 3.4493e-01], [-1.9072e-01, -5.9194e-01, 9.7846e-01, ..., 7.0911e-02, -2.7881e-01, 5.8151e-01]]]), pooler_output=None, hidden_states=None, past_key_values=None, attentions=None, cross_attentions=None)
<class 'transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions'>
Traceback (most recent call last):
File "/Users/alekseikrasov/Desktop/OntoChem/work/ChemRxnExtractor/ChemRxnExtractor/pipeline.py", line 22, in
I tried to use the old version of transformers v.3.0.2, however, I cannot compile tokenizers==0.8.1.rc1 (from transformers==3.0.2) with existing Rust compiler. The same problem was on Linux Opensuse machine.
Could you please help to solve this problem and maybe update the code and requirements so that one could use the latest version of Python, transformers, and tokenizers?
I have met the same problems. Could there is a solution?
Unfortunately, the installation on Mac with ARM architecture failed, however, I managed to install and run it on a Linux machine with the required dependencies. The only thing I could recommend is using Python<3.9 on Linux with the rest of the installation instructions.
Hi @LingjieBao1998, I have found the solution for Mac users with Apple silicon. You can follow next instruction to install ChemRxnExtractor:
1. Install compatible conda with either Miniforge or Anaconda(Miniconda). We recommend using Miniforge.
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64
and follow instructions.
if you have installed Anaconda(Miniconda) update it to the latest version:
update the conda package manager to the latest version
conda update conda
use conda to update Anaconda to the latest version
conda update anaconda
check the version of Anaconda. The 2022.05 release of Anaconda Distribution features native compiling for Apple M1’s ARM64 architecture.
Set conda-forge as prior channel:
conda config --add channels conda-forge
conda config --set channel_priority strict
2. Create environment and install essential libraries.
conda create --name ENV_NAME "python<3.12"
conda activate ENV_NAME
pip install install pyproject-toml torch tqdm numpy seqeval -U
It’s important to use channel conda-forge for installation next version of tokenizers and transformed.
Find the version of tokenizers according to your Python version:
conda search tokenizers
E.g. works fine:
- for Python 3.9 tokenizers=0.10.1 transformers=3.0.2
- for Python 3.11 tokenizers=[0.13.1, 0.13.2] and transformers=[3.0.2, 3.1.0]
Install tokenizers then transformers:
conda install -c conda-forge tokenizers=0.13.2
conda install -c conda-forge transformers=3.1.0
3. Install ChemRxnExtractor
git clone https://github.com/jiangfeng1124/ChemRxnExtractor
cd ChemRxnExtractor
pip install -e .
4. If the error occurs:
line XXX, in init BertWordPieceTokenizer( TypeError: init() got an unexpected keyword argument 'vocab_file'
Please, go to the file:
/Users/USER_NAME/miniforge3/envs/ENV_NAME/lib/python3.YY/site-packages/transformers/tokenization_bert.py