CTranslate2
CTranslate2 copied to clipboard
Fast inference engine for Transformer models
When oneDNN and OpenBLAS are enabled the `wyoming-faster-whisper` does not start and works only with OpenBLAS enabled: https://github.com/NixOS/nixpkgs/pull/237788#issuecomment-1592070847 On the other hand, the `libretranslate` also breaks when oneDNN and OpenBLAS...
When converting the model, I enable the quantization to 'int8', but I noticed a decrease in performance of the converted model by 5 points in terms of BLEU. Therefore, I...
Support adding stop words as sequence of tokens. For example, in python code generation, it's fairly common to adding following tokens as stop words: ```py stop_words = [ "\ndef", "\nclass",...
DialoGPT reads a user input and then generates text until EOS token uppears But for DialoGPT EOS can uppear multiple times And after every responce to keep the context you...
Assisted Generation feature from https://huggingface.co/blog/assisted-generation Seems improve a lot on quantized inference latency
A new family of Salesforce coding models has been released https://huggingface.co/Salesforce/codet5p-2b/ Context: [CodeT5+](https://github.com/salesforce/CodeT5/tree/main/CodeT5+) is a new family of open code large language models with an encoder-decoder architecture that can flexibly...
framework: OpenNMT-tf model: ``` class TinyDualSourceTransformer(onmt.models.Transformer): def __init__(self): super(TinyDualSourceTransformer, self).__init__( source_inputter=onmt.inputters.ParallelInputter([ onmt.inputters.WordEmbedder(embedding_size=256), onmt.inputters.WordEmbedder(embedding_size=256)]), target_inputter=onmt.inputters.WordEmbedder(embedding_size=256), num_layers=4, num_units=128, num_heads=4, ffn_inner_dim=512, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, share_encoders=True) def auto_config(self, num_replicas=1): config = super(TinyDualSourceTransformer, self).auto_config(num_replicas=num_replicas) max_length...
Hello team, Today, batch generation works like the HF generate() function: it accepts several input texts but generation parameters (like temperature, top k, etc.) apply to the whole batch, so...