Minh-Thuc comments

Results 86 comments of


                                            Minh-Thuc

Output tokens logits

In the method ``generate_tokens``, there is an option ``return_log_prob``. It is ``false`` by default. You can activate it and get the ``log_prob`` in the result by ``result.log_prob``

Output tokens logits

Hello, I will include the log prob for the whole vocabulary.

Likely bug: log_prob is not affected by sampling_temperature

You are using ``sampling_topk = 1``, in this case, ``random sampler`` is used and we don't use ``sampling_temperature`` to randomize the sample (``best sampler`` is affected by ``sampling_temperature `` instead)....

Asynchronous execution: High latency when retrieving results

How do you get the time of the result for the first 1000 samples ? Normally, when 100 000 samples are passed in the ``for-loop 1``, the first 1000 samples...

DirectML Support

Currently, Ctranslate2 do not support ``DirectML``. To support this, the new implementation for this backend is required.

Qwen Support

Actually, we don't have plan to support this model yet, it'll be in the backlog for the future

Request to support FlashAttention in cuda attention.cc

Ctranslate2 supports soon the flash attention 2 following this PR #1651. I will do the release asap. I made some tests and saw an improvement in performance with long prompt....

Request to support FlashAttention in cuda attention.cc

> This is great! Any chance you could provide some tips as to how to test this on faster-whisper? Make sure you have Ampere GPU or newer. You can just...

Request to support FlashAttention in cuda attention.cc

Hello, I did not make a benchmark with Faster Whisper, but there is some benchmark for Flash Attention with some LLM models [here](https://github.com/OpenNMT/CTranslate2/issues/1676).

Request to support FlashAttention in cuda attention.cc

> Thank you for your attempt to help! 😄 I will post this question directly in the `faster-whisper` repo while waiting for @minhthuc2502 's response. With recent tests, I posted...