jieba paddle 模式下词性标注，并发时可能会出错

paddle 模式下词性标注，并发时可能会出错

Open hscspring opened this issue 4 years ago • 7 comments

错误代码在 predict.py：

def get_result(str1):
    feed_data=dataset.get_vars(str1)
    a = numpy.array(feed_data).astype(numpy.int64)
    a=a.reshape(-1,1)
    c = fluid.create_lod_tensor(a, [[a.shape[0]]], place)

    words, crf_decode = exe.run(
            infer_program,
            fetch_list=[infer_ret['words'], infer_ret['crf_decode']],
            feed={"words":c, },
            return_numpy=False,
            use_program_cache=True)
    results=[]
    results += utils.parse_result(words, crf_decode, dataset)
    return results

原因是 exe.run 没有能执行成功，结果为空 list，words, crf_decode 绑定空 list 导致程序错误。

另外并发时还有个问题就是 token 可能会切的非常长。

测试环境： MacOS Mojave 10.14.6 2.7 GHz Inter Core i5 8G 1867 MHz DDR3

gRPC Server Python 3.7.4

测试工具： ghz · Simple gRPC benchmarking and load testing tool

另外，我弄了个简单的复现，不知道有没有帮助： https://github.com/hscspring/pseg_paddle/tree/master/stress_test