torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

Issue with blobfile installing leading to non-deterministic failures on CI

Open metascroy opened this issue 9 months ago • 3 comments

Blobfile not installing correctly with pip.

(cchat) scroy@scroy-mbp torchchat % which python
/opt/miniconda3/envs/cchat/bin/python


(cchat) scroy@scroy-mbp torchchat % pip install blobfile
Requirement already satisfied: blobfile in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (2.1.1)
Requirement already satisfied: pycryptodomex~=3.8 in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (from blobfile) (3.20.0)
Requirement already satisfied: urllib3<3,>=1.25.3 in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (from blobfile) (2.2.1)
Requirement already satisfied: lxml~=4.9 in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (from blobfile) (4.9.4)
Requirement already satisfied: filelock~=3.0 in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (from blobfile) (3.14.0)


python -c "import blobfile"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/miniconda3/envs/cchat/lib/python3.10/site-packages/blobfile/__init__.py", line 6, in <module>
    from blobfile._ops import (
  File "/opt/miniconda3/envs/cchat/lib/python3.10/site-packages/blobfile/_ops.py", line 19, in <module>
    from blobfile._common import DirEntry, Stat, RemoteOrLocalPath
  File "/opt/miniconda3/envs/cchat/lib/python3.10/site-packages/blobfile/_common.py", line 30, in <module>
    from blobfile import _xml as xml
  File "/opt/miniconda3/envs/cchat/lib/python3.10/site-packages/blobfile/_xml.py", line 6, in <module>
    from lxml import etree
ImportError: dlopen(/opt/miniconda3/envs/cchat/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so, 0x0002): Library not loaded: @rpath/libxml2.2.dylib
  Referenced from: <CF7C533F-0E7E-3AE3-856A-7C6D160B1AA9> /opt/miniconda3/envs/cchat/lib/python3.10/site-packages/lxml/etree.cpython-310-darwin.so
  Reason: tried: '/opt/miniconda3/envs/ctbench/lib/libxml2.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/miniconda3/envs/ctbench/lib/libxml2.2.dylib' (no such file), '/opt/miniconda3/envs/ctbench/lib/libxml2.2.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/miniconda3/envs/ctbench/lib/libxml2.2.dylib' (no such file), '/opt/miniconda3/envs/cchat/bin/../lib/libxml2.2.dylib' (no such file), '/opt/miniconda3/envs/cchat/bin/../lib/libxml2.2.dylib' (no such file)

This results in errors like "RuntimeError: model-specified tokenizer (SentencePiece) does not match provided tokenizer (SentencePiece for model" due to selection logic here: https://github.com/pytorch/torchchat/blob/main/build/builder.py#L163

It appears to result in non-deterministic failures on our CI mac MPS job.:

  • Failed example: https://github.com/pytorch/torchchat/actions/runs/8902280895/job/24447902156?pr=598
  • Succeed on retry: https://github.com/pytorch/torchchat/actions/runs/8903180954/job/24450476935?pr=594

(Unfortunately the above two are from different PRs, so are not the best illustration. But the failed job will often succeed on retry.)

metascroy avatar May 01 '24 00:05 metascroy