Tatoeba-Challenge icon indicating copy to clipboard operation
Tatoeba-Challenge copied to clipboard

convert_marian_tatoeba_to_pytorch FileNotFoundError

Open Lyaaaaaaaaaaaaaaa opened this issue 2 years ago • 5 comments

Hello, I'm trying to convert more models to the pytorch format, but I'm getting an error.

I'm running the convert_marian_tatoeba_to_pytorch script, but it seems like it's looking for a readme.md file in the models/results folder, yet there is none.

Traceback (most recent call last):
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 1282, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 58, in __init__
    reg = self.make_tatoeba_registry()
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 264, in make_tatoeba_registry
    lns = list(open(p / "README.md").readlines())
    
FileNotFoundError: [Errno 2] No such file or directory: 'Tatoeba-Challenge/models/results/README.md'

Lyaaaaaaaaaaaaaaa avatar Oct 30 '22 08:10 Lyaaaaaaaaaaaaaaa

Could you try this script: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/convert_to_pytorch.py

jorgtied avatar Jan 12 '23 07:01 jorgtied

Hello, I will try this one and update you.

Lyaaaaaaaaaaaaaaa avatar Jan 16 '23 11:01 Lyaaaaaaaaaaaaaaa

Hello, sorry for the long delay. I ran your script and got another error. TypeError: expected str, bytes or os.PathLike object, not NoneType

The logs:

python3 model_converter/convert_to_pytorch.py --model-path opus-en-pt --dest-path converted/opus-en-pt

added 1 tokens to vocab
Traceback (most recent call last):
  File "/home/path_to_project/model_converter/convert_to_pytorch.py", line 28, in <module>
    convert(Path(args.model_path), Path(args.dest_path))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 663, in convert
    opus_state = OpusState(source_dir)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 494, in __init__
    self.tokenizer = self.load_tokenizer()
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 593, in load_tokenizer
    return MarianTokenizer.from_pretrained(str(self.source_dir))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
    return cls._from_pretrained(
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py", line 158, in __init__
    assert Path(source_spm).exists(), f"cannot find spm source {source_spm}"
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 1082, in __new__
    self = cls._from_parts(args, init=False)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 707, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 691, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Additional information:

  • I'm running the script within a miniconda environment (Miniconda3-py39_23.1.0-1-Linux-x86_64.sh had been used to create the environment) Here are the environment packages
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
accelerate                0.18.0             pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4            py39h72bdee0_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
arrow-cpp                 11.0.0          ha770c72_13_cpu    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.6.26               hf365957_1    conda-forge
aws-c-cal                 0.5.21               h48707d8_2    conda-forge
aws-c-common              0.8.14               h0b41bf4_0    conda-forge
aws-c-compression         0.2.16               h03acc5a_5    conda-forge
aws-c-event-stream        0.2.20               h00877a2_4    conda-forge
aws-c-http                0.7.6                hf342b9f_0    conda-forge
aws-c-io                  0.13.19              h5b20300_3    conda-forge
aws-c-mqtt                0.8.6               hc4349f7_12    conda-forge
aws-c-s3                  0.2.7                h909e904_1    conda-forge
aws-c-sdkutils            0.1.8                h03acc5a_0    conda-forge
aws-checksums             0.1.14               h03acc5a_5    conda-forge
aws-crt-cpp               0.19.8              hf7fbfca_12    conda-forge
aws-sdk-cpp               1.10.57              h17c43bd_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cryptography              40.0.1           py39h079d5ae_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
datasets                  2.11.0             pyhd8ed1ab_0    conda-forge
dill                      0.3.6              pyhd8ed1ab_1    conda-forge
filelock                  3.10.7             pyhd8ed1ab_0    conda-forge
frozenlist                1.3.3            py39hb9d737c_0    conda-forge
fsspec                    2023.3.0           pyhd8ed1ab_1    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
huggingface_hub           0.13.3             pyhd8ed1ab_0    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.1.0              pyha770c72_0    conda-forge
importlib_metadata        6.1.0                hd8ed1ab_0    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libabseil                 20230125.0      cxx17_hcb278e6_1    conda-forge
libarrow                  11.0.0          h93537a5_13_cpu    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcurl                   7.88.1               hdc1c0ab_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h28343ad_4    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgoogle-cloud           2.8.0                h0bc5f78_1    conda-forge
libgrpc                   1.52.1               hcf146ea_1    conda-forge
libhwloc                  2.9.0                hd6dc26d_0    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnuma                   2.0.16               h0b41bf4_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsentencepiece          0.1.97               h47aad16_1    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libthrift                 0.18.1               h5e4af38_0    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libxml2                   2.10.3               hfdac1af_6    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               16.0.0               h417c0b6_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
magma                     2.6.2                hc72dce7_0    conda-forge
mkl                       2022.2.1         h84fe81f_16997    conda-forge
multidict                 6.0.4            py39h72bdee0_0    conda-forge
multiprocess              0.70.14          py39hb9d737c_3    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
ninja                     1.11.1               h924138e_0    conda-forge
numpy                     1.24.2           py39h7360e5f_0    conda-forge
openssl                   3.1.0                h0b41bf4_0    conda-forge
orc                       1.8.3                hfdbbad2_0    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3            py39h2ad29b5_1    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
pip                       23.0.1             pyhd8ed1ab_0    conda-forge
psutil                    5.9.4            py39hb9d737c_0    conda-forge
pyarrow                   11.0.0          py39hf0ef2fd_13_cpu    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.7           hf930737_3_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-xxhash             3.2.0            py39h72bdee0_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytorch                   1.13.1          cuda112py39hb0b7ed5_200    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39hb9d737c_5    conda-forge
re2                       2023.02.02           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
regex                     2023.3.23        py39h72bdee0_0    conda-forge
requests                  2.28.2             pyhd8ed1ab_1    conda-forge
responses                 0.18.0             pyhd8ed1ab_0    conda-forge
s2n                       1.3.41               h3358134_0    conda-forge
sacremoses                0.0.53             pyhd8ed1ab_0    conda-forge
sentencepiece             0.1.97               hf3d152e_1    conda-forge
sentencepiece-python      0.1.97           py39h0fce851_1    conda-forge
sentencepiece-spm         0.1.97               h47aad16_1    conda-forge
setuptools                67.6.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h9b69904_2    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge
tbb                       2021.8.0             hf52228f_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tokenizers                0.13.2           py39h585fa2d_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
transformers              4.27.4             pyhd8ed1ab_0    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
ucx                       1.14.0               h538f049_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
websockets                10.4             py39hb9d737c_1    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xxhash                    0.8.1                h0b41bf4_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.8.2            py39hb9d737c_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

Lyaaaaaaaaaaaaaaa avatar Apr 01 '23 08:04 Lyaaaaaaaaaaaaaaa

Did you download the model that you want to convert? The script expects the model in the model path you specify on command-line. Maybe this makefile helps you to see how I use the script for converting models: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/Makefile

jorgtied avatar Apr 01 '23 11:04 jorgtied

Hello, yes I downloaded the model I want to convert, Opus-en-pt. I believe I downloaded the good format, here is the list of files present in the opus-en-pt folder. Just in case

decoder.yml
opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz
opus.bpe32k-bpe32k.transformer.valid1.log
postprocess.sh
README.md
source.tcmodel
tokenizer_config.json
LICENSE
opus.bpe32k-bpe32k.transformer.train1.log
opus.bpe32k-bpe32k.vocab.yml
preprocess.sh
source.bpe
target.bpe
vocab.json

I have difficulties to understand the makefile.

Lyaaaaaaaaaaaaaaa avatar Apr 02 '23 08:04 Lyaaaaaaaaaaaaaaa