PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Bug]: 层级多标签分类使用GPU训练的模型在CPU下部署后推理报错

Open abbydev opened this issue 1 year ago • 10 comments

软件环境

# pip list | grep paddle
paddle-bfloat      0.1.7
paddle2onnx        1.0.9
paddlefsl          1.1.0
paddlenlp          2.5.2
paddlepaddle-gpu   2.5.1.post112

重复问题

  • [X] I have searched the existing issues

错误描述

app = SimpleServer()
app.register(
    "models/cls_hierarchical",
    model_path=f'{model_dir}/export',
    tokenizer_name=model_name,
    model_handler=CustomModelHandler,
    post_handler=MultiLabelClassificationPostHandler,
    device_id=-1
)
使用GPU环境训练的模型导出后,使用以上方式CPU部署(device_id=-1),显示部署成功。
第1次调用推理的RestAPI,成功返回推理结果
第2次调用推理的RestAPI,返回结果为空数组
第3次调用推理的RestAPI,报内部服务错误 Internal service error

稳定复现步骤 & 代码

  • 环境信息
Linux version 3.10.0-1160.92.1.el7.x86_64
Tesla T4 16G
NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8
  • 复现步骤
1,GPU训练模型
2,导出模型
3,simple_serving方式部署:device_id=-1
4,postman调用api
5,调用第3次时报内部错误
按照以上步骤执行3~5步,每次必现

abbydev avatar Nov 03 '23 02:11 abbydev

@wj-Mcat 能帮忙看看吗?谢谢!

abbydev avatar Nov 24 '23 07:11 abbydev

碰到同样的问题,也是用GPU训练层级多标签分类模型后,按照官方教程导出,再使用simple_serving方式部署,只传了一条待处理句子,多次请求,10次有9次都返回为空,有返回时结果也是错误的,还会出现同时返回多个标签,并且有返回时每次结果还会变化。@abbydev,看到你三四月时就问过层次文本分类的问题,都12月了才进行到部署工作?你是学生?

imempty avatar Nov 30 '23 16:11 imempty

我是很想支持国产,可是这个paddle社区不是很活跃,官方也不理不睬。@imempty 我3-4月是初次用的paddle框架,GPU部署早就上线了。 我遇到的问题是:我在GPU环境训练的模型,导出后想在CPU环境部署起来,就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]

上述问题,也查了类似的issues,版本匹配的问题,但是框架不是需要高版本向下兼容的吗?

abbydev avatar Dec 11 '23 08:12 abbydev

我是很想支持国产,可是这个paddle社区不是很活跃,官方也不理不睬。@imempty 我3-4月是初次用的paddle框架,GPU部署早就上线了。 我遇到的问题是:我在GPU环境训练的模型,导出后想在CPU环境部署起来,就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]

上述问题,也查了类似的issues,版本匹配的问题,但是框架不是需要高版本向下兼容的吗?

你用simple_serving+GPU部署能正常使用? simple_serving+CPU部署如issue一楼所示的报错问题解决了? 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!

imempty avatar Dec 11 '23 08:12 imempty

@imempty 你用simple_serving+GPU部署能正常使用?====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了?====能启动起来,但是调用三次就内部错误了 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231,但是我试过没有用 我理解的是:导出的模型和框架是解耦的,而且高版本的paddle应该向下兼容的,但实际上并不是这样,我都有点想换pytorch了,国产的遇到了问题,支持力度不够,也没个官方的来维护冒泡,这个就是百度10几年来的通病,高开低走,后劲不足。。。请原谅我的言辞冒犯了,但是事实就是如此

abbydev avatar Dec 11 '23 08:12 abbydev

@imempty 你用simple_serving+GPU部署能正常使用?====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了?====能启动起来,但是调用三次就内部错误了 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231,但是我试过没有用 我理解的是:导出的模型和框架是解耦的,而且高版本的paddle应该向下兼容的,但实际上并不是这样,我都有点想换pytorch了,国产的遇到了问题,支持力度不够,也没个官方的来维护冒泡,这个就是百度10几年来的通病,高开低走,后劲不足。。。请原谅我的言辞冒犯了,但是事实就是如此

刚又看了下,simple_serving没有显式设置使用GPU推理的选项吧?至少它的service.py和client.py里都没有。 我当初选paddle是图它傻瓜式集成,看教程感觉跟着官方步骤走就能实现需求。但实际执行起来发现bug很多,原封不动地用官方项目代码根本跑不起来。 问问题没人解决,继续下去就是下一个mxnet

imempty avatar Dec 11 '23 09:12 imempty

@imempty 显式设置使用GPU推理的选项=====》这个有,我是看源码找到的 device_id=-1 表示使用CPU

abbydev avatar Dec 11 '23 09:12 abbydev

我贴一下完整的报错:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/xxx/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    self.scope, self.receive, self.send
  File "/home/xxx/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/applications.py", line 292, in __call__
    await super().__call__(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 274, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 192, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/xxx/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/xxx/lib/python3.7/site-packages/anyio/to_thread.py", line 34, in run_sync
    func, *args, cancellable=cancellable, limiter=limiter
  File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/http_router/router.py", line 61, in predict
    result = self._app._model_manager.predict(inference_request.data, inference_request.parameters)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/model_manager.py", line 94, in predict
    model_output = self._model_handler(self._predictor_list[predictor_id], self._tokenizer, data, parameters)
  File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/handlers/custom_model_handler.py", line 73, in process
    predictor._predictor.run()
MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12.
  [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50)
  [operator < fill_constant > error]

abbydev avatar Dec 11 '23 09:12 abbydev

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Feb 13 '24 00:02 github-actions[bot]

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Apr 27 '24 00:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar May 12 '24 00:05 github-actions[bot]