PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Question]: 我需要在离线环境下部署pp-uie模型,我把训练好的模型放到了uer/.paddlenlp/model/paddlenlp/PP-uie-7B,但是每次信息抽取仍然需要下载shards,我应该如何解决呢?

Open anxiangyi opened this issue 9 months ago • 8 comments

请提出你的问题

我想要在离线环境使用pp-uie模型,但是每次信息抽取都需要下载shards

anxiangyi avatar Apr 16 '25 09:04 anxiangyi

这里可以离线下载 https://paddlenlp.readthedocs.io/zh/latest/model_list.html

ZHUI avatar Apr 18 '25 06:04 ZHUI

这里可以离线下载 https://paddlenlp.readthedocs.io/zh/latest/model_list.html

不好意思,我看到这个地址已经无法使用了,而且如果离线下载之后我应该如何使用它呢.

anxiangyi avatar Apr 25 '25 04:04 anxiangyi

sorry 最近更新了一下。 https://paddlenlp.readthedocs.io/zh/latest/website/index.html

ZHUI avatar Apr 25 '25 10:04 ZHUI

sorry 最近更新了一下。 https://paddlenlp.readthedocs.io/zh/latest/website/index.html

可能你误解我的意思了,我是使用了我自己的模型Taskflow('information_extraction', schema=self.schema, model='paddlenlp/PP-UIE-0.5B',precision='float32'),我将我训练好的模型放到了.paddlenlp来使用我已经训练好的模型,但是当我离线使用时,会报错,

[2025-04-27 10:01:23,559] [ INFO] - The unk_token parameter needs to be defined: we use eos_token by default.
[2025-04-27 10:01:23,764] [ INFO] - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'paddlenlp/PP-UIE-0.5B'.
[2025-04-27 10:01:23,764] [ INFO] - Loading configuration file C:\Users\lenovo.paddlenlp\models\paddlenlp/PP-UIE-0.5B\config.json
[2025-04-27 10:01:23,765] [ INFO] - Loading weights file from cache at C:\Users\lenovo.paddlenlp\models\paddlenlp/PP-UIE-0.5B\model.safetensors.index.json
Downloading shards: 100%|██████████| 1/1 [00:00<00:00, 1763.79it/s]
[2025-04-27 10:02:18,967] [ INFO] - All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[2025-04-27 10:02:18,967] [ INFO] - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at paddlenlp/PP-UIE-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[2025-04-27 10:02:18,975] [ INFO] - Loading configuration file C:\Users\lenovo.paddlenlp\models\paddlenlp/PP-UIE-0.5B\generation_config.json

我怀疑是否是因为Downloading shards: 100%|██████████| 1/1 [00:00<00:00, 1763.79it/s]这条日志,每次使用nlp都需要下载shards才导致没有网的时候报错

anxiangyi avatar Apr 27 '25 02:04 anxiangyi

sorry 最近更新了一下。 https://paddlenlp.readthedocs.io/zh/latest/website/index.html

以下是我离线时,他的报错信息:

Traceback (most recent call last):
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 464, in _make_request
    self._validate_conn(conn)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 1093, in _validate_conn
    conn.connect()
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connection.py", line 741, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connection.py", line 920, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\ssl_.py", line 460, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\ssl_.py", line 504, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 1104, in _create
    self.do_handshake()
  File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 1375, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    response = self._make_request(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 488, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\adapters.py", line 667, in send
    resp = conn.urlopen(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen
    retries = retries.increment(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/paddlenlp/PP-UIE-0.5B/chat_template.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\__init__.py", line 238, in resolve_file_path
    is_available = bos_aistudio_hf_file_exist(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\__init__.py", line 331, in bos_aistudio_hf_file_exist
    out = bos_file_exists(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\bos_download.py", line 269, in bos_file_exists
    get_bos_file_metadata(url, token=token)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\bos_download.py", line 100, in get_bos_file_metadata
    r = _request_wrapper(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\common.py", line 346, in _request_wrapper
    response = _request_wrapper(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\common.py", line 368, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/paddlenlp/PP-UIE-0.5B/chat_template.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\projects\contract_nlp_project\utils\pdf2txt.py", line 243, in <module>
    a = File2Txt(r'F:\projects\contract_nlp_project\utils\2024-1.pdf', r'F:\projects\contract_nlp_project\utils')
  File "F:\projects\contract_nlp_project\utils\pdf2txt.py", line 38, in __init__
    self.ie = Taskflow('information_extraction', schema=self.schema, model='paddlenlp/PP-UIE-0.5B',
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 869, in __init__
    self.task_instance = task_class(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\information_extraction.py", line 150, in __init__
    self._construct_tokenizer(model)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\information_extraction.py", line 178, in _construct_tokenizer
    self._tokenizer = AutoTokenizer.from_pretrained(model)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\auto\tokenizer.py", line 478, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\tokenizer_utils.py", line 835, in from_pretrained
    tokenizer, tokenizer_config_file_dir = super().from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\tokenizer_utils_base.py", line 1590, in from_pretrained
    resolved_vocab_files[file_id] = resolve_file_path(
  File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\__init__.py", line 286, in resolve_file_path
    raise EnvironmentError(
OSError: Can't load the model for 'paddlenlp/PP-UIE-0.5B'. If you were trying to load it from 'BOS', make sure you don't have a local directory with the same name. Otherwise, make sure 'paddlenlp/PP-UIE-0.5B' is the correct path to a directory containing one of the ['chat_template.json']

anxiangyi avatar Apr 27 '25 02:04 anxiangyi

你好,你本地造一个 空的 chat_template.json 文件试一试?

ZHUI avatar Apr 27 '25 09:04 ZHUI

请问这个问题解决了吗,我现在也在离线情况下遇到了类似的问题

sorry 最近更新了一下。 https://paddlenlp.readthedocs.io/zh/latest/website/index.html

以下是我离线时,他的报错信息:

Traceback (most recent call last): File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 464, in _make_request self._validate_conn(conn) File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 1093, in _validate_conn conn.connect() File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connection.py", line 741, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connection.py", line 920, in ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\ssl.py", line 460, in ssl_wrap_socket ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\ssl.py", line 504, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 513, in wrap_socket return self.sslsocket_class._create( File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 1104, in _create self.do_handshake() File "D:\anaconda\envs\my_nlp\lib\ssl.py", line 1375, in do_handshake self._sslobj.do_handshake() ssl.SSLEOFError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007) During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen response = self._make_request( File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 488, in _make_request raise new_e urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen retries = retries.increment( File "D:\anaconda\envs\my_nlp\lib\site-packages\urllib3\util\retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/paddlenlp/PP-UIE-0.5B/chat_template.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download_init_.py", line 238, in resolve_file_path is_available = bos_aistudio_hf_file_exist( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download_init_.py", line 331, in bos_aistudio_hf_file_exist out = bos_file_exists( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\bos_download.py", line 269, in bos_file_exists get_bos_file_metadata(url, token=token) File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\bos_download.py", line 100, in get_bos_file_metadata r = _request_wrapper( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\common.py", line 346, in _request_wrapper response = _request_wrapper( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download\common.py", line 368, in _request_wrapper response = get_session().request(method=method, url=url, **params) File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "D:\anaconda\envs\my_nlp\lib\site-packages\requests\adapters.py", line 698, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/community/paddlenlp/PP-UIE-0.5B/chat_template.json (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "F:\projects\contract_nlp_project\utils\pdf2txt.py", line 243, in a = File2Txt(r'F:\projects\contract_nlp_project\utils\2024-1.pdf', r'F:\projects\contract_nlp_project\utils') File "F:\projects\contract_nlp_project\utils\pdf2txt.py", line 38, in init self.ie = Taskflow('information_extraction', schema=self.schema, model='paddlenlp/PP-UIE-0.5B', File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 869, in init self.task_instance = task_class( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\information_extraction.py", line 150, in init self._construct_tokenizer(model) File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\taskflow\information_extraction.py", line 178, in _construct_tokenizer self.tokenizer = AutoTokenizer.from_pretrained(model) File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\auto\tokenizer.py", line 478, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs) File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\tokenizer_utils.py", line 835, in from_pretrained tokenizer, tokenizer_config_file_dir = super().from_pretrained(pretrained_model_name_or_path, *args, **kwargs) File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\transformers\tokenizer_utils_base.py", line 1590, in from_pretrained resolved_vocab_files[file_id] = resolve_file_path( File "D:\anaconda\envs\my_nlp\lib\site-packages\paddlenlp\utils\download_init.py", line 286, in resolve_file_path raise EnvironmentError( OSError: Can't load the model for 'paddlenlp/PP-UIE-0.5B'. If you were trying to load it from 'BOS', make sure you don't have a local directory with the same name. Otherwise, make sure 'paddlenlp/PP-UIE-0.5B' is the correct path to a directory containing one of the ['chat_template.json']

请问这个问题解决了吗,我目前在离线环境也遇到了类似的问题

tszxyuan123 avatar Jun 13 '25 06:06 tszxyuan123

确实,本地造一个空的chat_template.json 文件就好了

tszxyuan123 avatar Jun 13 '25 09:06 tszxyuan123

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Aug 13 '25 00:08 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Aug 28 '25 00:08 github-actions[bot]

确实,本地造一个空的chat_template.json 文件就好了

这个空文件要放在哪个位置呢,我放在了模型下边(PP-UIE-1.5B)下, 运行还会报错 File "/usr/local/lib/python3.12/json/decoder.py", line 356, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

ZRX19 avatar Oct 28 '25 05:10 ZRX19