PaddleNLP [Bug]: 层级多标签分类使用GPU训练的模型在CPU下部署后推理报错

软件环境

# pip list | grep paddle
paddle-bfloat      0.1.7
paddle2onnx        1.0.9
paddlefsl          1.1.0
paddlenlp          2.5.2
paddlepaddle-gpu   2.5.1.post112

重复问题

[X] I have searched the existing issues

错误描述

app = SimpleServer()
app.register(
    "models/cls_hierarchical",
    model_path=f'{model_dir}/export',
    tokenizer_name=model_name,
    model_handler=CustomModelHandler,
    post_handler=MultiLabelClassificationPostHandler,
    device_id=-1
)
使用GPU环境训练的模型导出后，使用以上方式CPU部署(device_id=-1)，显示部署成功。
第1次调用推理的RestAPI，成功返回推理结果
第2次调用推理的RestAPI，返回结果为空数组
第3次调用推理的RestAPI，报内部服务错误 Internal service error

稳定复现步骤 & 代码

环境信息

Linux version 3.10.0-1160.92.1.el7.x86_64
Tesla T4 16G
NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8

复现步骤

1，GPU训练模型
2，导出模型
3，simple_serving方式部署：device_id=-1
4，postman调用api
5，调用第3次时报内部错误
按照以上步骤执行3~5步，每次必现

Nov 03 '23 02:11 abbydev

@wj-Mcat 能帮忙看看吗？谢谢！

Nov 24 '23 07:11 abbydev

碰到同样的问题，也是用GPU训练层级多标签分类模型后，按照官方教程导出，再使用simple_serving方式部署，只传了一条待处理句子，多次请求，10次有9次都返回为空，有返回时结果也是错误的，还会出现同时返回多个标签，并且有返回时每次结果还会变化。@abbydev，看到你三四月时就问过层次文本分类的问题，都12月了才进行到部署工作？你是学生？

Nov 30 '23 16:11 imempty

我是很想支持国产，可是这个paddle社区不是很活跃，官方也不理不睬。@imempty 我3-4月是初次用的paddle框架，GPU部署早就上线了。我遇到的问题是：我在GPU环境训练的模型，导出后想在CPU环境部署起来，就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]

上述问题，也查了类似的issues，版本匹配的问题，但是框架不是需要高版本向下兼容的吗？

Dec 11 '23 08:12 abbydev

我是很想支持国产，可是这个paddle社区不是很活跃，官方也不理不睬。@imempty 我3-4月是初次用的paddle框架，GPU部署早就上线了。我遇到的问题是：我在GPU环境训练的模型，导出后想在CPU环境部署起来，就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]

上述问题，也查了类似的issues，版本匹配的问题，但是框架不是需要高版本向下兼容的吗？

你用simple_serving+GPU部署能正常使用？ simple_serving+CPU部署如issue一楼所示的报错问题解决了？你这个新的报错看起来像是内存不够，可这个size的数字看起来好大！

Dec 11 '23 08:12 imempty

@imempty 你用simple_serving+GPU部署能正常使用？====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了？====能启动起来，但是调用三次就内部错误了你这个新的报错看起来像是内存不够，可这个size的数字看起来好大！=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231，但是我试过没有用我理解的是：导出的模型和框架是解耦的，而且高版本的paddle应该向下兼容的，但实际上并不是这样，我都有点想换pytorch了，国产的遇到了问题，支持力度不够，也没个官方的来维护冒泡，这个就是百度10几年来的通病，高开低走，后劲不足。。。请原谅我的言辞冒犯了，但是事实就是如此

Dec 11 '23 08:12 abbydev

@imempty 你用simple_serving+GPU部署能正常使用？====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了？====能启动起来，但是调用三次就内部错误了你这个新的报错看起来像是内存不够，可这个size的数字看起来好大！=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231，但是我试过没有用我理解的是：导出的模型和框架是解耦的，而且高版本的paddle应该向下兼容的，但实际上并不是这样，我都有点想换pytorch了，国产的遇到了问题，支持力度不够，也没个官方的来维护冒泡，这个就是百度10几年来的通病，高开低走，后劲不足。。。请原谅我的言辞冒犯了，但是事实就是如此

刚又看了下，simple_serving没有显式设置使用GPU推理的选项吧？至少它的service.py和client.py里都没有。我当初选paddle是图它傻瓜式集成，看教程感觉跟着官方步骤走就能实现需求。但实际执行起来发现bug很多，原封不动地用官方项目代码根本跑不起来。问问题没人解决，继续下去就是下一个mxnet

Dec 11 '23 09:12 imempty

@imempty 显式设置使用GPU推理的选项=====》这个有，我是看源码找到的 device_id=-1 表示使用CPU

Dec 11 '23 09:12 abbydev

我贴一下完整的报错：

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/xxx/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    self.scope, self.receive, self.send
  File "/home/xxx/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/applications.py", line 292, in __call__
    await super().__call__(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 274, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 192, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/xxx/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/xxx/lib/python3.7/site-packages/anyio/to_thread.py", line 34, in run_sync
    func, *args, cancellable=cancellable, limiter=limiter
  File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/http_router/router.py", line 61, in predict
    result = self._app._model_manager.predict(inference_request.data, inference_request.parameters)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/model_manager.py", line 94, in predict
    model_output = self._model_handler(self._predictor_list[predictor_id], self._tokenizer, data, parameters)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/handlers/custom_model_handler.py", line 73, in process
    predictor._predictor.run()
MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12.
  [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50)
  [operator < fill_constant > error]

Dec 11 '23 09:12 abbydev

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

Feb 13 '24 00:02 github-actions[bot]

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

Apr 27 '24 00:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

May 12 '24 00:05 github-actions[bot]

PaddleNLP PaddleNLP copied to clipboard

[Bug]: 层级多标签分类使用GPU训练的模型在CPU下部署后推理报错

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

PaddleNLP
PaddleNLP copied to clipboard