gpt-fast issues

Rotary Embeddings Implementation

I was comparing the rotary embedding implementation in this repository with the implementations in the official Llama and Deepseek repositories using this Jupyter notebook: [link](https://colab.research.google.com/drive/1I9aBN55UUgmUwSNTmELC1u7DWuEk1dU2?usp=sharing). In Llama and Deepseek repositories,...

hello-fri-end

Dynamo error when trying to run eval.py `'Tensor' object has no attribute 'mask_mod'`

``` (/home/bobren/local/a/pytorch-env) [15:08] devgpu035:/home/bobren/local/a/gpt-fast python eval.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth Loading model ... Time to load model: 6.96 seconds. README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.84k/6.84k [00:00

bobrenjc93

对于源码没有给出的模型（如DeepSeek-R1-Distill-Qwen-1.5B）该如何使用这个方法？

我看源码中需要使用到model.py文件中的modelArgs，但我想要在DeepSeek的模型上使用这个方法，该如何实践？

yabuke

int4 quant broken right now?

3

I tried the following and seems it breaks right now ``` > python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 --groupsize 64 Loading model ... Quantizing model weights for int4 weight-only affine...

jerryzh168

Issue: Performance degradation with int8 quantization in multi-batch scenarios

2

When using int8 quantization, there is a significant performance drop in multi-batch inference compared to single-batch inference. The single-batch performance is good, but the performance doesn't scale well with increased...

kakarotzzz

Adding torchao apis to gpt-fast

3

Summary: adding torchao apis to gpt-fast and some minor tweaks Test Plan: (in progress) export MODEL_REPO=meta-llama/Meta-Llama-3-8B python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path...

HDCharles

CLA Signed

Llama 3.1 70B size mismatch for tok_embeddings.weight

Hey, thanks for providing the gpt-fast project, I am getting an error when trying to run inference. I have fine-tuned a llama-3.1-70B model with LoRa, using torchtune, converted the checkpoint...

albertbn

Has anyone run this code with bs>1 and speculatively?

deafTim

Mistake in 191 line if is_speculative=True generate.py ?

1

Am I right that here is a mistake? In 191 line generate.py Because for batch>1 cur_token will have more than 1 element so next_token.view(()) will give an error. ``` if...

deafTim

Error with meta-llama/Llama-3.2-1B

2

When I use meta-llama/Llama-3.2-1B Can it be fixed? ``` RuntimeError: Error(s) in loading state_dict for Transformer: Missing key(s) in state_dict: "tok_embeddings.weight", "layers.0.attention.wqkv.weight", "layers.0.attention.wo.weight", "layers.0.feed_forward.w1.weight", "layers.0.feed_forward.w3.weight", "layers.0.feed_forward.w2.weight", "layers.0.ffn_norm.weight", "layers.0.attention_norm.weight", "layers.1.attention.wqkv.weight", "layers.1.attention.wo.weight",...

deafTim

gpt-fast
gpt-fast copied to clipboard

Metadata

Rotary Embeddings Implementation

Dynamo error when trying to run eval.py `'Tensor' object has no attribute 'mask_mod'`

对于源码没有给出的模型（如DeepSeek-R1-Distill-Qwen-1.5B）该如何使用这个方法？

int4 quant broken right now?

Issue: Performance degradation with int8 quantization in multi-batch scenarios

Adding torchao apis to gpt-fast

Llama 3.1 70B size mismatch for tok_embeddings.weight

Has anyone run this code with bs>1 and speculatively?

Mistake in 191 line if is_speculative=True generate.py ?

Error with meta-llama/Llama-3.2-1B

← Metadata

Owner

Metadata

gpt-fast gpt-fast copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-fast
gpt-fast copied to clipboard