transformers
transformers copied to clipboard
Error when using AutoTokenizer to load local files without network
System Info
-
transformers
version: 4.42.3 - Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: 0.31.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA A100-PCIE-40GB
Who can help?
@ArthurZucker
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Here are the results of my analysis and the corresponding steps to reproduce:
I examined the stack trace in step 4 and found that the issue may stem from line 505 in transformers/dynamic_module_utils.py
within the get_class_from_dynamic_module
function, where the first parameter repo_id
is incorrectly set when calling get_cached_module_file
. It should have been set to the parameter pretrained_model_name_or_path
(in my case, the value of this parameter is /home/xx/chatglm3-6b
), but instead, it received THUDM/chatglm3-6b--tokenization_chatglm.ChatGLMTokenizer
, as set in line 497.
I believe the logic in lines 496-499 needs adjustment. When pretrained_model_name_or_path
represents a file path, repo_id
should directly reflect pretrained_model_name_or_path
. Whether or not my analysis is correct, I would like to personally fix this issue and contribute to the open-source process.
- The server is in a state where it cannot connect to the network.
- Cloned https://huggingface.co/THUDM/chatglm3-6b to the proxy machine and copied it from the proxy machine to the directory /home/xx/chatglm3-6b.
- Ran the following code:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('/home/xx/chatglm3-6b', trust_remote_code=True)
- Encountered the following error:
Could not locate the tokenization_chatglm.py inside THUDM/chatglm3-6b.
Traceback (most recent call last):
File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 196, in _new_conn
sock = connection.create_connection(
File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "{}/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
raise new_e
File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
self._validate_conn(conn)
File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
conn.connect()
File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 615, in connect
self.sock = sock = self._new_conn()
File "{}/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "{}/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "{}/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "{}/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "{}/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "{}/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 66, in send
return super().send(request, *args, **kwargs)
File "{}/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /THUDM/chatglm3-6b/resolve/main/tokenization_chatglm.py (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefc286a7a0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: a2a5cb2f-dfdd-4747-aad0-fe648d2bfc70)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "{}/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "{}/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "{}/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 871, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 505, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "{}/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 308, in get_cached_module_file
resolved_module_file = cached_file(
File "{}/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like THUDM/chatglm3-6b is not the path to a directory containing a file named tokenization_chatglm.py.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Expected behavior
The tokenizer should be loaded correctly