CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Fast inference engine for Transformer models

Results 173 CTranslate2 issues
Sort by recently updated
recently updated
newest added

When oneDNN and OpenBLAS are enabled the `wyoming-faster-whisper` does not start and works only with OpenBLAS enabled: https://github.com/NixOS/nixpkgs/pull/237788#issuecomment-1592070847 On the other hand, the `libretranslate` also breaks when oneDNN and OpenBLAS...

When converting the model, I enable the quantization to 'int8', but I noticed a decrease in performance of the converted model by 5 points in terms of BLEU. Therefore, I...

Support adding stop words as sequence of tokens. For example, in python code generation, it's fairly common to adding following tokens as stop words: ```py stop_words = [ "\ndef", "\nclass",...

enhancement

DialoGPT reads a user input and then generates text until EOS token uppears But for DialoGPT EOS can uppear multiple times And after every responce to keep the context you...

enhancement

Assisted Generation feature from https://huggingface.co/blog/assisted-generation Seems improve a lot on quantized inference latency

enhancement

A new family of Salesforce coding models has been released https://huggingface.co/Salesforce/codet5p-2b/ Context: [CodeT5+](https://github.com/salesforce/CodeT5/tree/main/CodeT5+) is a new family of open code large language models with an encoder-decoder architecture that can flexibly...

enhancement

framework: OpenNMT-tf model: ``` class TinyDualSourceTransformer(onmt.models.Transformer): def __init__(self): super(TinyDualSourceTransformer, self).__init__( source_inputter=onmt.inputters.ParallelInputter([ onmt.inputters.WordEmbedder(embedding_size=256), onmt.inputters.WordEmbedder(embedding_size=256)]), target_inputter=onmt.inputters.WordEmbedder(embedding_size=256), num_layers=4, num_units=128, num_heads=4, ffn_inner_dim=512, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, share_encoders=True) def auto_config(self, num_replicas=1): config = super(TinyDualSourceTransformer, self).auto_config(num_replicas=num_replicas) max_length...

enhancement

Hello team, Today, batch generation works like the HF generate() function: it accepts several input texts but generation parameters (like temperature, top k, etc.) apply to the whole batch, so...

enhancement