hexisyztem comments

Results 39 comments of


                                            hexisyztem

Does lightseq support pipeline parallelism?

Sorry, we do not support this feature.

how to use lightseq inference engine

Yes, but there are a lot of details that can be confusing for third-party developers. > lyzKF

how to use lightseq inference engine

Compile environment you can refer to this: https://github.com/bytedance/lightseq/blob/master/docker/Tritonserver/Dockerfile.

how to use lightseq inference engine

If you have any questions, you can ask me and I will try my best to help. In the future, we will consider providing users with fine-grained operators to facilitate...

Running examples meet error, with cuda 11.6

What is the run command you executed.

support bert+linear+crf

The next step is to process the compilation logic, compile the old and new versions into the same .so dynamic library, and specify different models through compilation options

请问，gpt2 转换成lightseq后，支持传入token_type_ids吗？

Sorry, I missed your message last week. If you are using an older version of tensorrtserver, the inputs are named "inputs_ids", and the logic is determined in the underlying code:...

请问，gpt2 转换成lightseq后，支持传入token_type_ids吗？

Sorry, I just misunderstood what you meant. Currently we do not support token_type_ids.

请问下可以支持llama和bloom推理加速吗

It will be supported in May, and it is expected that V100-32G can be deployed.

As you can see in https://github.com/HazyResearch/flash-attention, flash attention doesn't support V100. From: ***@***.***> Date: Mon, Jun 12, 2023, 17:32 Subject: [External] Re: [bytedance/lightseq] 请问下可以支持llama和bloom推理加速吗 (Issue #502) To: ***@***.***> Cc: ***@***.***>,...