gpt-fast
gpt-fast copied to clipboard
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
``` :843: _call_with_frames_removed: block: [8,0,0], thread: [58,0,0] Assertion `index out of bounds: 0
I only got a little improvement than the native code. Was there any I missed? # Commands **cli 1:** time python generate.py --compile --compile_prefill --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens...
Small models in HF don't have pytorch_model.bin.index.json files, since they are unnecessary. I changed the convert_hf_checkpoint.py to allow a single pytorch_model.bin file as the model description. I added PY007/TinyLlama-1.1B-intermediate-step-480k-1T to...
Quantize the model to int8 and it gave this error: ubuntu@ip-172-31-19-240:~/gpt-fast$ python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 Loading model ... /opt/conda/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in...
Can you help explain this comment? What is the best setup for `torch.compile`? https://github.com/pytorch-labs/gpt-fast/blob/db7b273ab86b75358bd3b014f1f022a19aba4797/generate.py#L64-L67
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), Model config {'block_size': 2048, 'vocab_size': 32000, 'n_layer': 32, 'n_head': 32, 'dim': 4096, 'intermediate_size': 11008, 'n_local_heads': 32, 'head_dim': 128, 'rope_base': 10000, 'norm_eps': 1e-05} /mnt/user/wangchenpeng/venv/fast/lib/python3.8/site-packages/torch/_utils.py:831: UserWarning:...
Hi, Does the GPTQ converted models the same as AutoGPTQ? Do they share the same configuration settting such as group size actor order and such? There are plenty of GPTQ...
Would it be hard to adapt this code for Mistral? I tried open orca version and set vocab_size in config to 32002. But shapes did not match: ``` File "/experiments/dev/nsherstnev/gpt-fast/scripts/convert_hf_checkpoint.py",...
Hi, Thanks for the great work! I have a question. Does thhe codes in L172-L173(model.py) doing Position Embedding? Like the red box in the pic below.  If it is...
Hi all, i'm using `llama-2-7b-chat` model i tried to run `generate.py` with following command line parameters `--compile --checkpoint_path llama/llama-2-7b-chat/consolidated.00.pth --prompt "Hello, my name is"` Got below output/ error >Loading model...