Minh-Thuc comments

Results 86 comments of


                                            Minh-Thuc

feat: add ruy sgemm implementation

Hello @ebraraktas, sorry for the late response. I think you should add implementation of ``RUY`` after ``BLAS``. The RUY implementation will be used if only the ``BLAS`` does not exist.

feat: add ruy sgemm implementation

Thanks. I relaunched the failed CI. I'll merge then

Splitting LLM layers across multiple GPUs

I just pushed a PR #1599 to support tensor parallel. This will help to split models into multiple GPUs different. I tested this feature with some models like Llama2, translator...

Splitting LLM layers across multiple GPUs

I updated the comment above for 2 cases: batch_size = 1 and batch_size = 5.

Splitting LLM layers across multiple GPUs

I'll close this issue as the feature is now supported. If you have any problems, feel free to open the new issue.

Distributed mode

I closed this issue as completed. The tensor parallelism is currently supported in Ctranslate2.

Error when converting NMT model with ALiBi or RoPe

Can you provide more in detail how to run the converter?

target_prefix latency

If you specified the target_prefix, it would decode once in a step then generate one by one with the next steps. Without target_prefix, it would generate one by one token....

BENCHmarking new flash attention!

Hello, What is the average seq_length in your benchmark? The flash attention have a better performance for the long prompt only.

BENCHmarking new flash attention!

I means number of token of input. I would be great to compare with and without FA2 with the prompt's size from 1000 to 3000 tokens. I think the prompt...