COMET icon indicating copy to clipboard operation
COMET copied to clipboard

504 Server error when running comet-score using multiple machines

Open Smu-Tan opened this issue 1 year ago • 10 comments

🐛 Bug

Hi! A 504 server error is encountered when running multiple comet-score scripts. See below:

Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py, line 261, in hf_raise_for_status response.raise_for_status() File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/requests/models.py, line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/Unbabel/wmt22-comet-da/revision/main

The above exception was the direct cause of the following exception: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 46, in download_model model_path = snapshot_download( File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py, line 186, in snapshot_download repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/hf_api.py, line 1868, in repo_info return method( File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py, line 118, in _inner_fn return fn(*args, **kwargs) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/hf_api.py, line 1678, in model_info hf_raise_for_status(r) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py, line 303, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/Unbabel/wmt22-comet-da/revision/main

During handling of the above exception, another exception occurred: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 51, in download_model checkpoint_path = download_model_legacy(model, saving_directory) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/download_utils.py, line 224, in download_model_legacy raise Exception( Exception: Unbabel/wmt22-comet-da is not in the available_legacy_metrics or is a valid checkpoint folder.

During handling of the above exception, another exception occurred: Traceback (most recent call last): File /home/stan1/anaconda3/envs/prefix_mt/bin/comet-score, line 8, in <module> sys.exit(score_command()) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/cli/score.py, line 154, in score_command model_path = download_model(cfg.model, saving_directory=cfg.model_storage_path) File /home/stan1/anaconda3/envs/prefix_mt/lib/python3.9/site-packages/comet/models/__init__.py, line 53, in download_model raise KeyError(fModel {model} not supported by COMET.) KeyError: Model Unbabel/wmt22-comet-da not supported by COMET.

To Reproduce

Here's the reproduction code template, pls ignore the task and seed setting.

#!/bin/bash

RESULT_DIR=zero-shot

TASKS=(zs) SEEDS=(1234) SRCAR=('de' 'nl' 'sv' 'da' 'is') TGTAR=('de' 'nl' 'sv' 'da' 'is')

for (( t=0; t<${#TASKS[@]}; t++ )) do for (( s=0; s<${#SEEDS[@]}; s++ )) do first_id=$((t*${#SEEDS[@]}+s)) for (( i=0; i<${#SRCAR[@]}; i++ )) do second_id=$((first_id*${#SRCAR[@]}+i)) for (( j=0; j<${#TGTAR[@]}; j++ )) do third_id=$((second_id*${#TGTAR[@]}+j))

if [ "$third_id" -eq "$SLURM_ARRAY_TASK_ID" ] then

SRC=${SRCAR[i]} TGT=${TGTAR[j]}

if [[ "$SRC" != "$TGT" ]] then

echo "SRC-TGT: $SRC-$TGT"

SOURCE_SENT=${RESULT_DIR}/${SRC}-${TGT}/test-src.txt HYPOTHESIS=${RESULT_DIR}/${SRC}-${TGT}/test-sys.txt REFERENCE=${RESULT_DIR}/${SRC}-${TGT}/test-ref.txt comet-score -s ${SOURCE_SENT} -t ${HYPOTHESIS} -r ${REFERENCE} --quiet --only_system > ${RESULT_DIR}/${SRC}-${TGT}/test_comet.txt

fi fi

done done
done done

Environment

OS: Linux (slurm) comet version: newest

Smu-Tan avatar Aug 23 '23 17:08 Smu-Tan

Hmm this seems to be a problem downloading the model and on HF side. Have you tried it recently?

ricardorei avatar Sep 21 '23 21:09 ricardorei

it could be that HF Hub was down for a period

ricardorei avatar Sep 21 '23 21:09 ricardorei

@Smu-Tan have you solved your problem?? I'm getting the same error of downloading the model.

haroon830 avatar Oct 23 '23 04:10 haroon830

@ricardorei Hi, I run the code

from comet import download_model, load_from_checkpoint
model_path = download_model("Unbabel/XCOMET-XL")

and get this exception:

Exception: Unbabel/XCOMET-XL is not in the available_legacy_metrics or is a valid checkpoint folder.

After checking this file, I found the available_legacy_metrics in comet/models/download_utils.py does not have the corresponding key-value pair. Can you update this file or tell me the way to directly download it on the HF?

the current version of unbabel-comet is 2.2.0 Best.

weichuanW avatar Dec 12 '23 16:12 weichuanW

Hey! Hmm this is weird. available_legacy_metrics should just be called when the model is not found on Hugging face. What is your hugging face hub version? can you send me the pip freeze output?

ricardorei avatar Dec 12 '23 16:12 ricardorei

OK, the following is the pip freeze list: accelerate==0.23.0 aeidon==1.12 aiofiles==23.2.1 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.2.0 annotated-types==0.6.0 antlr4-python3-runtime==4.8 anyio==3.7.1 argh==0.30.2 async-timeout==4.0.3 atomicwrites==1.4.1 attrs==23.1.0 beautifulsoup4==4.12.2 bitarray==2.8.3 bitsandbytes==0.41.1 blessed==1.20.0 blis==0.7.11 catalogue==2.0.10 certifi==2022.12.7 cffi==1.16.0 chardet==5.2.0 charset-normalizer==2.0.12 cheroot==10.0.0 chinese-converter==1.1.1 click==8.1.7 cloudpathlib==0.16.0 cmake==3.25.0 colorama==0.4.6 coloredlogs==10.0 confection==0.1.3 contourpy==1.2.0 coverage==4.5.4 cycler==0.12.1 cymem==2.0.8 Cython==3.0.5 datasets==2.14.5 dill==0.3.7 distro==1.8.0 docstring-parser==0.15 docx2txt==0.8 einops==0.7.0 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.0/en_core_web_lg-3.7.0-py3-none-any.whl#sha256=708da1110fbe1163d059de34a2cbedb1db65c26e1e624ca925897a2711cb7d77 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0-py3-none-any.whl#sha256=6215d71a3212690e9aec49408a27e3fe6ad7cd6c715476e93d70dc784041e93e enlighten==1.10.1 entmax==1.1 evaluate==0.4.1 exceptiongroup==1.1.3 fairseq==0.12.2 faiss==1.7.4 fastapi==0.104.1 fastbm25==0.0.2 fastBPE==0.1.1 fastest==0.3.1 fasttext==0.9.2 ffmpy==0.3.1 filelock==3.9.0 fluent.syntax==0.19.0 fonttools==4.44.0 frozenlist==1.4.0 fsspec==2023.6.0 gcld3==3.0.13 gradio==4.8.0 gradio_client==0.7.1 h11==0.14.0 httpcore==1.0.2 httpx==0.25.2 huggingface-hub==0.16.4 humanfriendly==10.0 hydra-core==1.0.7 icu==0.0.1 idna==3.4 importlib-resources==6.1.1 iniconfig==2.0.0 iniparse==0.5 jaraco.functools==3.9.0 Jinja2==3.1.2 joblib==1.3.2 jsonargparse==3.13.1 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 kiwisolver==1.4.5 langcodes==3.3.0 langdetect==1.0.9 latexcodec==2.0.1 Levenshtein==0.23.0 lightning-utilities==0.9.0 lingua-language-detector==1.3.3 lit==15.0.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.2 matplotlib==3.8.1 mdurl==0.1.2 mistletoe==1.2.1 more-itertools==10.1.0 mpmath==1.3.0 mtdata==0.4.0 multidict==6.0.4 multiprocess==0.70.15 murmurhash==1.0.10 networkx==3.0 numpy==1.24.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 omegaconf==2.0.6 optimum==1.13.2 orjson==3.9.10 packaging==23.2 pandas==2.1.1 pathtools==0.1.2 peft @ git+https://github.com/huggingface/peft@56556faa17263be8ef1802c172141705b71c28dc phply==1.2.6 Pillow==9.3.0 pluggy==0.13.1 ply==3.11 polyglot==16.7.4 portalocker==2.3.0 prefixed==0.7.0 preshed==3.0.9 protobuf==4.24.4 psutil==5.9.6 py==1.11.0 pyarrow==13.0.0 pybind11==2.11.1 pybtex==0.24.0 pycld2==0.42 pycparser==2.21 pydantic==2.4.2 pydantic_core==2.10.1 pydub==0.25.1 pyenchant==3.2.2 Pygments==2.17.2 PyICU==2.11 pyparsing==3.1.1 pytest==4.6.11 pytest-cov==2.10.1 python-dateutil==2.8.2 python-Levenshtein==0.23.0 python-multipart==0.0.6 pytorch-lightning==2.1.0 pytz==2023.3.post1 PyYAML==6.0.1 rank-bm25==0.2.2 rapidfuzz==3.4.0 referencing==0.32.0 regex==2023.10.3 requests==2.28.1 responses==0.18.0 rich==13.7.0 rpds-py==0.13.2 ruamel.yaml==0.17.32 ruamel.yaml.clib==0.2.8 sacrebleu==2.3.1 sacremoses==0.0.53 safetensors==0.4.0 scikit-build==0.17.6 scipy==1.11.3 seaborn==0.13.0 semantic-version==2.10.0 sentencepiece==0.1.99 shellingham==1.5.4 shtab==1.6.4 six==1.16.0 smart-open==6.4.0 sniffio==1.3.0 soupsieve==2.5 spacy==3.7.2 spacy-language-detection==0.2.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.8 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 thinc==8.2.1 tokenizers==0.14.1 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.0 torch==2.0.1 torchaudio==2.0.2+cu117 torchmetrics==0.10.3 torchvision==0.15.2+cu117 tqdm==4.66.1 transformers==4.34.1 translate-toolkit==3.10.1 transliterate==1.10.2 triton==2.0.0 trl==0.7.4 typer==0.9.0 typing_extensions==4.8.0 tyro==0.5.17 tzdata==2023.3 unbabel-comet==2.2.0 urllib3==1.26.13 uvicorn==0.24.0.post1 vobject==0.9.6.1 wasabi==1.1.2 watchdog==0.9.0 wcwidth==0.2.8 weasel==0.3.3 websockets==11.0.3 wmtformat @ git+https://github.com/wmt-conference/wmt-format-tools.git@49983f17d8c99207c66a7f43fa49aa71d0692e48 xxhash==3.4.1 yarl==1.9.2 zhon==2.0.2

the hugging face hub version is huggingface-hub==0.16.4, I upgrade it to huggingface-hub-0.19.4 but still not work with the same error:)

weichuanW avatar Dec 13 '23 00:12 weichuanW

OK, the following is the pip freeze list: accelerate==0.23.0 aeidon==1.12 aiofiles==23.2.1 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.2.0 annotated-types==0.6.0 antlr4-python3-runtime==4.8 anyio==3.7.1 argh==0.30.2 async-timeout==4.0.3 atomicwrites==1.4.1 attrs==23.1.0 beautifulsoup4==4.12.2 bitarray==2.8.3 bitsandbytes==0.41.1 blessed==1.20.0 blis==0.7.11 catalogue==2.0.10 certifi==2022.12.7 cffi==1.16.0 chardet==5.2.0 charset-normalizer==2.0.12 cheroot==10.0.0 chinese-converter==1.1.1 click==8.1.7 cloudpathlib==0.16.0 cmake==3.25.0 colorama==0.4.6 coloredlogs==10.0 confection==0.1.3 contourpy==1.2.0 coverage==4.5.4 cycler==0.12.1 cymem==2.0.8 Cython==3.0.5 datasets==2.14.5 dill==0.3.7 distro==1.8.0 docstring-parser==0.15 docx2txt==0.8 einops==0.7.0 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.0/en_core_web_lg-3.7.0-py3-none-any.whl#sha256=708da1110fbe1163d059de34a2cbedb1db65c26e1e624ca925897a2711cb7d77 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0-py3-none-any.whl#sha256=6215d71a3212690e9aec49408a27e3fe6ad7cd6c715476e93d70dc784041e93e enlighten==1.10.1 entmax==1.1 evaluate==0.4.1 exceptiongroup==1.1.3 fairseq==0.12.2 faiss==1.7.4 fastapi==0.104.1 fastbm25==0.0.2 fastBPE==0.1.1 fastest==0.3.1 fasttext==0.9.2 ffmpy==0.3.1 filelock==3.9.0 fluent.syntax==0.19.0 fonttools==4.44.0 frozenlist==1.4.0 fsspec==2023.6.0 gcld3==3.0.13 gradio==4.8.0 gradio_client==0.7.1 h11==0.14.0 httpcore==1.0.2 httpx==0.25.2 huggingface-hub==0.16.4 humanfriendly==10.0 hydra-core==1.0.7 icu==0.0.1 idna==3.4 importlib-resources==6.1.1 iniconfig==2.0.0 iniparse==0.5 jaraco.functools==3.9.0 Jinja2==3.1.2 joblib==1.3.2 jsonargparse==3.13.1 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 kiwisolver==1.4.5 langcodes==3.3.0 langdetect==1.0.9 latexcodec==2.0.1 Levenshtein==0.23.0 lightning-utilities==0.9.0 lingua-language-detector==1.3.3 lit==15.0.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.2 matplotlib==3.8.1 mdurl==0.1.2 mistletoe==1.2.1 more-itertools==10.1.0 mpmath==1.3.0 mtdata==0.4.0 multidict==6.0.4 multiprocess==0.70.15 murmurhash==1.0.10 networkx==3.0 numpy==1.24.4 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 omegaconf==2.0.6 optimum==1.13.2 orjson==3.9.10 packaging==23.2 pandas==2.1.1 pathtools==0.1.2 peft @ git+https://github.com/huggingface/peft@56556faa17263be8ef1802c172141705b71c28dc phply==1.2.6 Pillow==9.3.0 pluggy==0.13.1 ply==3.11 polyglot==16.7.4 portalocker==2.3.0 prefixed==0.7.0 preshed==3.0.9 protobuf==4.24.4 psutil==5.9.6 py==1.11.0 pyarrow==13.0.0 pybind11==2.11.1 pybtex==0.24.0 pycld2==0.42 pycparser==2.21 pydantic==2.4.2 pydantic_core==2.10.1 pydub==0.25.1 pyenchant==3.2.2 Pygments==2.17.2 PyICU==2.11 pyparsing==3.1.1 pytest==4.6.11 pytest-cov==2.10.1 python-dateutil==2.8.2 python-Levenshtein==0.23.0 python-multipart==0.0.6 pytorch-lightning==2.1.0 pytz==2023.3.post1 PyYAML==6.0.1 rank-bm25==0.2.2 rapidfuzz==3.4.0 referencing==0.32.0 regex==2023.10.3 requests==2.28.1 responses==0.18.0 rich==13.7.0 rpds-py==0.13.2 ruamel.yaml==0.17.32 ruamel.yaml.clib==0.2.8 sacrebleu==2.3.1 sacremoses==0.0.53 safetensors==0.4.0 scikit-build==0.17.6 scipy==1.11.3 seaborn==0.13.0 semantic-version==2.10.0 sentencepiece==0.1.99 shellingham==1.5.4 shtab==1.6.4 six==1.16.0 smart-open==6.4.0 sniffio==1.3.0 soupsieve==2.5 spacy==3.7.2 spacy-language-detection==0.2.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.8 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 thinc==8.2.1 tokenizers==0.14.1 tomli==2.0.1 tomlkit==0.12.0 toolz==0.12.0 torch==2.0.1 torchaudio==2.0.2+cu117 torchmetrics==0.10.3 torchvision==0.15.2+cu117 tqdm==4.66.1 transformers==4.34.1 translate-toolkit==3.10.1 transliterate==1.10.2 triton==2.0.0 trl==0.7.4 typer==0.9.0 typing_extensions==4.8.0 tyro==0.5.17 tzdata==2023.3 unbabel-comet==2.2.0 urllib3==1.26.13 uvicorn==0.24.0.post1 vobject==0.9.6.1 wasabi==1.1.2 watchdog==0.9.0 wcwidth==0.2.8 weasel==0.3.3 websockets==11.0.3 wmtformat @ git+https://github.com/wmt-conference/wmt-format-tools.git@49983f17d8c99207c66a7f43fa49aa71d0692e48 xxhash==3.4.1 yarl==1.9.2 zhon==2.0.2

the hugging face hub version is huggingface-hub==0.16.4, I upgrade it to huggingface-hub-0.19.4 but still not work with the same error:)


The problem was solved by manually downloading the model from huggingface repo. Thx.

weichuanW avatar Dec 21 '23 09:12 weichuanW

You have to acknowledge the model's license on the web. Then perform a cli login on your code before downloading it.

mohataher avatar Mar 28 '24 01:03 mohataher

I forgot this issue. Thanks for answering @mohataher.

ricardorei avatar Mar 28 '24 09:03 ricardorei

SOLVED - had the same issue Unbabel/wmt23-cometkiwi-da-xl' not supported by COMET it turned out to be issue with loging to huggingface. If you have it installed go to huggingface.co/settings/tokens to generate your token then huggingface-cli login and paste in the token Now if you run the code again it should successfully download the model

laelhalawani avatar Jun 19 '24 09:06 laelhalawani