gpt-fast icon indicating copy to clipboard operation
gpt-fast copied to clipboard

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Results 132 gpt-fast issues
Sort by recently updated
recently updated
newest added

Hi maintainers @yanboliang @Chillee , I saw Int8 Weight-Only Quantization is enabled in Mixtral 8x7B. And the next step should be supporting int4 and int4-gptq. May I know the timeline...

…them and remapping their keys Integrating loading, merging, and remapping into one step reduces the overall processing time by minimizing redundant operations. Pre-compiling the regular expression used for identifying and...

CLA Signed

Hi! I tried to convert `princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT` but it failed: ``` ❯ ./scripts/prepare.sh $MODEL_REPO (gptfast) README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1.37k/1.37k [00:00

I can run the script successfully as explained in the repository, such as creating a quantized model and then running it with generate.py. However, the actual issue arises when I...

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #102 Summary: att Adding this for accuracy evaluation, we also added this in executorch repo and we'll dedup later Test Plan: quantization:...

CLA Signed

Hi! How does ppl compare between fp16 and your int4?

I write a simple test to get the triton code of `WeightOnlyInt8Linear`,the test code is as follows: ``` import torch import torch.nn as nn import torch.nn.functional as F class WeightOnlyInt8Linear(torch.nn.Module):...