CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Fast inference engine for Transformer models

Results 173 CTranslate2 issues
Sort by recently updated
recently updated
newest added

**Context** In language model generation, we use the hyperparameter `sampling_temperature` to adjust the probability distribution of predicting the next token. A smaller `sampling_temperature` sharpens the distribution, whereas a larger `sampling_temperature`...

This PR adds SGEMM implementation with RUY. This is already [mentioned](https://github.com/SYSTRAN/faster-whisper/issues/237) in `faster-whisper` repository. I implemented this, because my experience with BLAS on Android was worse than this, and BLAS...

Hi, Can install CTranslate2 in ppc64? Regards

Hello, Unless I'm mistaken, I don't see any option in the translator translate_batch or generate_tokens functions to output the logits/probabilities of the generated tokens. However, this computation must be done...

Assume that I already followed Microsoft's instructions to [Enable PyTorch with DirectML on Windows](//learn.microsoft.com/en-us/windows/ai/directml/gpu-pytorch-windows) and the DirectML library loads correctly according to MS's example code. If I wanted to use...

enhancement

In huggingface transformers, there is a generate option called `sequence_bias` to increase/decrease the logits of user-specified sequence of tokens, using the `SequenceBiasLogitsProcessor`. Would be nice to have such generation option...

When I was trying out some other T5 models and those models used the T5Tokenizer for eg. `ct2-transformers-converter --model Rostlab/prot_t5_xl_uniref50 --output_dir ./prot-t5-ct2/ ` There is an Exception: You're trying to...

# CANN Backend support ## Introduction `CANN` (Compute Architecture of Neural Networks), developed by Huawei, is a heterogeneous computing architecture for AI scenarios. It provides multi-layer programming interfaces to help...

I used Ctranslate2-quantized version of fastchat-t5 (https://huggingface.co/limcheekin/fastchat-t5-3b-ct2), as the LLM of a question answering system. The QA system is wrapped in Rest API. The model works really well. But an...

`CANN` (Compute Architecture of Neural Networks), developed by Huawei, is a heterogeneous computing architecture for AI scenarios. It provides multi-layer programming interfaces to help users quickly build AI applications and...