gpt-fast icon indicating copy to clipboard operation
gpt-fast copied to clipboard

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Results 132 gpt-fast issues
Sort by recently updated
recently updated
newest added

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like...

Extend existing device variable to support code gen for other targets. New args to generate -- use_sdpa # by default, SDPA is disabled for best performance on CUDA. CPU presents...

CLA Signed

this PR fixes the checkpoint conversion scripts for the `.safetensors` checkpoints on https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 This is necessary because 1. safetensors are loaded differently from pt files 2. the `.pt` files and...

CLA Signed

I'm using torch.__version__ = 2.1.0a0+32f93b1 which doesn't have AttributeError: '_OpNamespace' 'aten' object has no attribute '_convert_weight_to_int4pack' What exactly does this do, and is it defined elsewhere? Unfortunately upgrading to the...

Added a scaling_factor to the rotary embedding calculation. This is for use with models like [DeepSeek](https://github.com/deepseek-ai/). DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided...

CLA Signed

Some preliminary perf numbers: TP=8, fp16, 163.69 tok/s

CLA Signed

`torch.compile` always re-compiles a function from scratch in a new Python session, which takes a lot of time. I'm wondering if there's a way to cache the compilation result in...

**Bug Report** **Description:** I encountered a bug when attempting to convert a model from Hugging Face (HF) using the provided code implementation. The issue appears to be related to counting...