gpt-fast issues

Too long input texts cuase device-side assert triggered

1

``` :843: _call_with_frames_removed: block: [8,0,0], thread: [58,0,0] Assertion `index out of bounds: 0

slight performance improving(ㄒoㄒ)

1

I only got a little improvement than the native code. Was there any I missed? # Commands **cli 1:** time python generate.py --compile --compile_prefill --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens...

480284856

Allow small modes to work with convert_hf_checkpoint. Added TinyLLama to the model list

Small models in HF don't have pytorch_model.bin.index.json files, since they are unnecessary. I changed the convert_hf_checkpoint.py to allow a single pytorch_model.bin file as the model description. I added PY007/TinyLlama-1.1B-intermediate-step-480k-1T to...

briandw

CLA Signed

AttributeError: torch._inductor.config.fx_graph_cache does not exist

1

Quantize the model to int8 and it gave this error: ubuntu@ip-172-31-19-240:~/gpt-fast$ python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 Loading model ... /opt/conda/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in...

chinmay29

Help explain "Actually better for Inductor to codegen attention here"

9

Can you help explain this comment? What is the best setup for `torch.compile`? https://github.com/pytorch-labs/gpt-fast/blob/db7b273ab86b75358bd3b014f1f022a19aba4797/generate.py#L64-L67

huntzhan

KeyError: 'model.layers.{}.self_attn.W_pack.weight'

4

device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), Model config {'block_size': 2048, 'vocab_size': 32000, 'n_layer': 32, 'n_head': 32, 'dim': 4096, 'intermediate_size': 11008, 'n_local_heads': 32, 'head_dim': 128, 'rope_base': 10000, 'norm_eps': 1e-05} /mnt/user/wangchenpeng/venv/fast/lib/python3.8/site-packages/torch/_utils.py:831: UserWarning:...

wccccp

Compatible with AutoGPTQ?

5

Hi, Does the GPTQ converted models the same as AutoGPTQ? Do they share the same configuration settting such as group size actor order and such? There are plenty of GPTQ...

yhyu13

Mistral support

1

Would it be hard to adapt this code for Mistral? I tried open orca version and set vocab_size in config to 32002. But shapes did not match: ``` File "/experiments/dev/nsherstnev/gpt-fast/scripts/convert_hf_checkpoint.py",...

Nikita-Sherstnev

question about position embedding

1

Hi, Thanks for the great work! I have a question. Does thhe codes in L172-L173(model.py) doing Position Embedding? Like the red box in the pic below. ![image](https://github.com/pytorch-labs/gpt-fast/assets/3783205/13fddcf1-efd5-428a-a748-b228ae698fc6) If it is...

jchuai

Unexpected key(s) in state_dict: "rope.freqs".

1

Hi all, i'm using `llama-2-7b-chat` model i tried to run `generate.py` with following command line parameters `--compile --checkpoint_path llama/llama-2-7b-chat/consolidated.00.pth --prompt "Hello, my name is"` Got below output/ error >Loading model...

Prakash19921206

gpt-fast
gpt-fast copied to clipboard

Metadata

Too long input texts cuase device-side assert triggered

slight performance improving(ㄒoㄒ)

Allow small modes to work with convert_hf_checkpoint. Added TinyLLama to the model list

AttributeError: torch._inductor.config.fx_graph_cache does not exist

Help explain "Actually better for Inductor to codegen attention here"

KeyError: 'model.layers.{}.self_attn.W_pack.weight'

Compatible with AutoGPTQ?

Mistral support

question about position embedding

Unexpected key(s) in state_dict: "rope.freqs".

← Metadata

Owner

Metadata

gpt-fast gpt-fast copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-fast
gpt-fast copied to clipboard