PaddleNLP
PaddleNLP copied to clipboard
[Bug]: 层级多标签分类使用GPU训练的模型在CPU下部署后推理报错
软件环境
# pip list | grep paddle
paddle-bfloat 0.1.7
paddle2onnx 1.0.9
paddlefsl 1.1.0
paddlenlp 2.5.2
paddlepaddle-gpu 2.5.1.post112
重复问题
- [X] I have searched the existing issues
错误描述
app = SimpleServer()
app.register(
"models/cls_hierarchical",
model_path=f'{model_dir}/export',
tokenizer_name=model_name,
model_handler=CustomModelHandler,
post_handler=MultiLabelClassificationPostHandler,
device_id=-1
)
使用GPU环境训练的模型导出后,使用以上方式CPU部署(device_id=-1),显示部署成功。
第1次调用推理的RestAPI,成功返回推理结果
第2次调用推理的RestAPI,返回结果为空数组
第3次调用推理的RestAPI,报内部服务错误 Internal service error
稳定复现步骤 & 代码
- 环境信息
Linux version 3.10.0-1160.92.1.el7.x86_64
Tesla T4 16G
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
- 复现步骤
1,GPU训练模型
2,导出模型
3,simple_serving方式部署:device_id=-1
4,postman调用api
5,调用第3次时报内部错误
按照以上步骤执行3~5步,每次必现
@wj-Mcat 能帮忙看看吗?谢谢!
碰到同样的问题,也是用GPU训练层级多标签分类模型后,按照官方教程导出,再使用simple_serving方式部署,只传了一条待处理句子,多次请求,10次有9次都返回为空,有返回时结果也是错误的,还会出现同时返回多个标签,并且有返回时每次结果还会变化。@abbydev,看到你三四月时就问过层次文本分类的问题,都12月了才进行到部署工作?你是学生?
我是很想支持国产,可是这个paddle社区不是很活跃,官方也不理不睬。@imempty 我3-4月是初次用的paddle框架,GPU部署早就上线了。 我遇到的问题是:我在GPU环境训练的模型,导出后想在CPU环境部署起来,就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]
上述问题,也查了类似的issues,版本匹配的问题,但是框架不是需要高版本向下兼容的吗?
我是很想支持国产,可是这个paddle社区不是很活跃,官方也不理不睬。@imempty 我3-4月是初次用的paddle框架,GPU部署早就上线了。 我遇到的问题是:我在GPU环境训练的模型,导出后想在CPU环境部署起来,就遇到了这个问题 MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12. [Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50) [operator < fill_constant > error]
上述问题,也查了类似的issues,版本匹配的问题,但是框架不是需要高版本向下兼容的吗?
你用simple_serving+GPU部署能正常使用? simple_serving+CPU部署如issue一楼所示的报错问题解决了? 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!
@imempty 你用simple_serving+GPU部署能正常使用?====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了?====能启动起来,但是调用三次就内部错误了 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231,但是我试过没有用 我理解的是:导出的模型和框架是解耦的,而且高版本的paddle应该向下兼容的,但实际上并不是这样,我都有点想换pytorch了,国产的遇到了问题,支持力度不够,也没个官方的来维护冒泡,这个就是百度10几年来的通病,高开低走,后劲不足。。。请原谅我的言辞冒犯了,但是事实就是如此
@imempty 你用simple_serving+GPU部署能正常使用?====是的 simple_serving+CPU部署如issue一楼所示的报错问题解决了?====能启动起来,但是调用三次就内部错误了 你这个新的报错看起来像是内存不够,可这个size的数字看起来好大!=====https://github.com/PaddlePaddle/PaddleNLP/issues/7231,但是我试过没有用 我理解的是:导出的模型和框架是解耦的,而且高版本的paddle应该向下兼容的,但实际上并不是这样,我都有点想换pytorch了,国产的遇到了问题,支持力度不够,也没个官方的来维护冒泡,这个就是百度10几年来的通病,高开低走,后劲不足。。。请原谅我的言辞冒犯了,但是事实就是如此
刚又看了下,simple_serving没有显式设置使用GPU推理的选项吧?至少它的service.py和client.py里都没有。 我当初选paddle是图它傻瓜式集成,看教程感觉跟着官方步骤走就能实现需求。但实际执行起来发现bug很多,原封不动地用官方项目代码根本跑不起来。 问问题没人解决,继续下去就是下一个mxnet
@imempty 显式设置使用GPU推理的选项=====》这个有,我是看源码找到的 device_id=-1 表示使用CPU
我贴一下完整的报错:
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/xxx/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
self.scope, self.receive, self.send
File "/home/xxx/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/fastapi/applications.py", line 292, in __call__
await super().__call__(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/xxx/lib/python3.7/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/home/xxx/lib/python3.7/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/xxx/lib/python3.7/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 274, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/home/xxx/lib/python3.7/site-packages/fastapi/routing.py", line 192, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/xxx/lib/python3.7/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/xxx/lib/python3.7/site-packages/anyio/to_thread.py", line 34, in run_sync
func, *args, cancellable=cancellable, limiter=limiter
File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/xxx/lib/python3.7/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/http_router/router.py", line 61, in predict
result = self._app._model_manager.predict(inference_request.data, inference_request.parameters)
File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/model_manager.py", line 94, in predict
model_output = self._model_handler(self._predictor_list[predictor_id], self._tokenizer, data, parameters)
File "/home/xxx/lib/python3.7/site-packages/paddlenlp/server/handlers/custom_model_handler.py", line 73, in process
predictor._predictor.run()
MemoryError: (ResourceExhausted) Fail to alloc memory of 8245807622825612480 size, error code is 12.
[Hint: Expected error == 0, but received error:12 != 0:0.] (at /paddle/paddle/fluid/memory/allocation/cpu_allocator.cc:50)
[operator < fill_constant > error]
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。