mindnlp icon indicating copy to clipboard operation
mindnlp copied to clipboard

离线方式执行分词异常

Open bluenessdragon opened this issue 9 months ago • 2 comments

Describe the bug/ 问题描述 (Mandatory / 必填) T5Tokenizer.from_pretrained(strTokenizer,cache_dir=cache_dir,local_files_only=True ) 在from_pretrained方法中,设置local_files_only为True,在断网的情况下,依然访问外网请求文件;

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境: windows的CPU环境

  • Software Environment / 软件环境 (Mandatory / 必填): -- MindSpore version (e.g., 1.7.0.Bxxx) :2.4.0 -- Python version (e.g., Python 3.7.5) :3.10

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:

To Reproduce / 重现步骤 (Mandatory / 必填) Steps to reproduce the behavior:

  1. 使用python执行语句:T5Tokenizer.from_pretrained('google-t5/t5-small',cache_dir=cache_dir,local_files_only=True )
  2. 已经执行过一次;
  3. 将本地网络关闭;
  4. 再次执行报连接外网外网异常

Expected behavior / 预期结果 (Mandatory / 必填) A clear and concise description of what you expected to happen. 已经下载好的分词器文件,在离线情况下不用再下载; Screenshots/ 日志 / 截图 (Mandatory / 必填) If applicable, add screenshots to help explain your problem.

Traceback (most recent call last): File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen retries = retries.increment( File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /google-t5/t5-small/resolve/main/added_tokens.json?download=true (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\mindnlp\utils\download.py", line 503, in cached_file resolved_file = download( File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\mindnlp\utils\download.py", line 627, in download pointer_path = http_get(url, storage_folder, download_file_name=relative_filename, proxies=proxies, headers=headers) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\mindnlp\utils\download.py", line 198, in http_get req = requests.get(url, stream=True, timeout=10, proxies=proxies, headers=headers) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "C:\Users\bluen\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 694, in send raise ProxyError(e, request=request) requests.exceptions.ProxyError: HTTPSConnectionPool(host='hf-mirror.com', port=443): Max retries exceeded with url: /google-t5/t5-small/resolve/main/added_tokens.json?download=true (Caused by ProxyError('Unable to connect to proxy', FileNotFoundError(2, 'No such file or directory')))

Additional context / 备注 (Optional / 选填) Add any other context about the problem here.

bluenessdragon avatar Feb 18 '25 09:02 bluenessdragon

网络问题,需要切换mirror,或者科学上网

lvyufeng avatar Feb 26 '25 03:02 lvyufeng

cache_dir没生效?

lvyufeng avatar Feb 26 '25 03:02 lvyufeng