Jiarui Fang(方佳瑞)

Results 220 comments of Jiarui Fang(方佳瑞)

看起来挺正常的,你多测几次,避免warmup开销,试试设置一下OMP线程数目

嗯嗯,设置方法如下 https://github.com/Tencent/TurboTransformers/blob/master/benchmark/run_cpu_variable_benchmark.sh#L31

把pytorc的代码用C++重写了一遍,加入了算子融合,矩阵乘法用了mkl,其他操作用omp并行加速。 没有相关paper。

Hello, sorry for my late response. I assume MTK -> MKL. Please refer to the README.md of branch `ppopp21_artifact_centos` for installation without docker. https://github.com/Tencent/TurboTransformers/tree/ppopp21_artifact_centos

BTW, you can copy installation scripts from that branch to branch `master`. Or pull master to branch `ppopp21_artifact_centos`.

Let me make sure you are using CPU for inference and your turbo version is 0.4.1. Generally, the first inference after runtime launched is very slow, you need to warm...

怎么定义自己搭建transformer?只要是transformers构成的都模型,都可以复用我们写好的transformers接口 https://github.com/Tencent/TurboTransformers/blob/master/turbo_transformers/python/tests/bert_layer_test.py bert layer就是一个transformer结构

Hi, Thanks for your attention to this project. We need contributors from the following perspectives. 1. Low precision supports. INT8 on CPU (Work In Progress), FP16 on GPU (TODO). 2....

Could you please send your contact address to my email. I am willing to provide some personal guidance for joining this project.

Turbo支持过标准的encoder-decoder NMT模型。对于bart的细节我没研究过,我认为方法应该类似。 https://github.com/TurboNLP/Translate-Demo/blob/master/mytranslator.py