Serving
Serving copied to clipboard
ernie模型和FasterTransformer冲突?报错symbol runGemmShortApi, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
1、python pipeline_service.py 启动服务,一切正常
2023/08/08 23:46:45 start proxy service
2、curl 访问算子 pipeline_service.py直接挂掉,同时std_out打印如下日志:
W0808 23:46:51.424680 206179 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.1 W0808 23:46:51.427105 206179 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. python: relocation error: /home/work/paddle_env/cuda_pkgs/cuda-11.1/lib64/libcublas.so: symbol runGemmShortApi, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
现象: load("FasterTransformer", build_dir="/home/work/paddle_env/ppnlp_home/extensions", verbose=True) 这行注释掉程序运行OK,但是这行注释掉会影响其他算子的运行,所以不能注释。
不知道ernie模型和FasterTransformer有啥冲突。
附录: 1)pipeline_service.py 部分代码如下 if name == "main": service = ModelService(name="model_service") script_dir = os.path.dirname(os.path.abspath(file)) root_dir = os.path.dirname(script_dir) load("FasterTransformer", build_dir="/home/work/paddle_env/ppnlp_home/extensions", verbose=True) config_file = os.path.join(script_dir, "conf/config.yml") service.prepare_pipeline_config(config_file) service.run_service()
2)算子初始化代码如下: import json from paddle_serving_server.web_service import Op import numpy as np from collections import namedtuple class TextQualityDetectOp(Op): """文本去噪模型"""
def init_op(self):
"""初始化
"""
from paddlenlp.transformers import AutoTokenizer
self.tokenizer = AutoTokenizer.from_pretrained("ernie-3.0-medium-zh",
use_fast=True)
self.fetch_names = [
"linear_147.tmp_1",
]