TurboTransformers
TurboTransformers copied to clipboard
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
不明白为什么镜像的体积搞的那么大?7-8个G,不是很友好
when we use the Transformer decoder in TurboTransformer V0.3.0, a cuda error appears. The error is shown below. RuntimeError: CUDA error: an illegal memory access was encountered [TT_ERROR] CUDA runtime...
您好 我使用tools 下的convert_tf_model_to_npz 想把我自己的一个tf1.14训练的模型转成npz 1.模型就放在tools路径下 如下:  2.代码里这样改动:  3.使用命令如下:  4.报错如下:  想了解下这里的原因是什么 我记得我试了tf2.0来自huggingface 的预训练模型也不行 是不是我路径这样写有什么问题 谢谢帮助
I am trying to use Turbo Transformer for inferencing on a trained BERT Transformers(Fastai with HuggingFace). I followed the steps mentioned under the section : '**How to customised your post-processing...
Do torch versions in benchmark https://github.com/Tencent/TurboTransformers/blob/master/docs/bert.md use `.half()` (FP16)?
/opt/conda/lib/python3.7/site-packages/torch/onnx/utils.py:738: UserWarning: ONNX export failed on ATen operator einsum because torch.onnx.symbolic_opset9.einsum does not exist .format(op_name, opset_version, op_name)) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker...
Could you explain a little bit more of the support of variable-length? Does it mean the runtime can support inputs with different sequences in a single session, like [batch, 8],...
Now the logic inside MultiheadAttention Layer is too complex for development. Moreover, some bugs exist in intermediate management. It is the first priority to rewrite these codes to make others...
Dear developers, I am trying to reproduce the [bert benchmarking result](https://github.com/Tencent/TurboTransformers/blob/master/docs/bert.md) on my machine.  I just run `bash run_gpu_benchmark.sh` but the QPS is much slower than the declared value....
Using FBGEMM to support CPU quantization.