hexisyztem
hexisyztem
Sorry, we do not support this feature.
Yes, but there are a lot of details that can be confusing for third-party developers. > lyzKF
Compile environment you can refer to this: https://github.com/bytedance/lightseq/blob/master/docker/Tritonserver/Dockerfile.
If you have any questions, you can ask me and I will try my best to help. In the future, we will consider providing users with fine-grained operators to facilitate...
What is the run command you executed.
The next step is to process the compilation logic, compile the old and new versions into the same .so dynamic library, and specify different models through compilation options
Sorry, I missed your message last week. If you are using an older version of tensorrtserver, the inputs are named "inputs_ids", and the logic is determined in the underlying code:...
Sorry, I just misunderstood what you meant. Currently we do not support token_type_ids.
It will be supported in May, and it is expected that V100-32G can be deployed.
As you can see in https://github.com/HazyResearch/flash-attention, flash attention doesn't support V100. From: ***@***.***> Date: Mon, Jun 12, 2023, 17:32 Subject: [External] Re: [bytedance/lightseq] 请问下可以支持llama和bloom推理加速吗 (Issue #502) To: ***@***.***> Cc: ***@***.***>,...