hexisyztem comments

Results 39 comments of


                                            hexisyztem

What is the requirement of lightseq3.0

> Because CMAKE_CUDA_ARCHITECTURES 60/61 don't support atomicAdd(__half*, float)

What is the requirement of lightseq3.0

> > What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88) to...

What is the requirement of lightseq3.0

> > > What is your GPU device model and what is the cuda version? If you compile on your own device, you can take a try to modify here(https://github.com/bytedance/lightseq/blob/master/CMakeLists.txt#L88)...

What is the requirement of lightseq3.0

If you want to run the test code, you can directly python3 test/xxx.py

does lightsq supportting distributed inference?

If you are talking about multiple models performing inference at the same time, then you can implement multi-card deployment through triton_server, and lightseq provides a solution for docking triton-server

Multi GPU triton failures

What is your configuration file, I guess you may have assigned all the models to GPU-0, but this requires me to analyze it in combination with your configuration

Multi GPU triton failures

By the way, instance_group - count needs to be seted to 1. https://github.com/bytedance/lightseq/blob/master/examples/triton_backend/model_repo/transformer_example/config.pbtxt#L25

Is it possible to accelerate mbart by lightseq?

We do not currently support it. After we complete the development of the new architecture, we will find ways to support more models at a low cost.

Question : About construction of total_cache_k, total_cache_v in Transformer

Sorry, this is the design of the new architecture, which uses some fixed syntax to manage GPU memory sharing.

Question : About construction of total_cache_k, total_cache_v in Transformer

The set_ancestor function is to assign cache_k to a continuous segment in total_cache_k. Specific to the case you gave here, cache_k_out can be removed.