gpt-fast
gpt-fast copied to clipboard
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
int4 quantization on CPU causes: ``` Traceback (most recent call last): File "/home/user/gpt-fast/quantize.py", line 622, in quantize(args.checkpoint_path, args.mode, args.groupsize, args.calibration_tasks, args.calibration_limit, args.calibration_seq_length, args.pad_calibration_inputs, args.percdamp, args.blocksize, args.label) File "/home/user/gpt-fast/quantize.py", line 569,...
Implement gpt-fast using flex_attention HOP. replies on this PR: https://github.com/pytorch/pytorch/pull/132157
For the llama model, in the sdpa function call, set enable_gqa=True to use the inbuilt grouped query attention functionality
This PR is to decouple int4 weight with serialized format, so that int4 model checkpoint can be shared in different test machines or ISAs, without re-generating in one certain platform....
i used the converter here: https://github.com/pytorch-labs/gpt-fast/blob/main/scripts/convert_hf_checkpoint.py but i get this error when trying to convert my huggingface checkpoint: ``` swarms@dpm4:~/gpt-fast/scripts$ python3 converthf_checkpoint.py --checkpoint_dir /home/swarms/checkpoint-4000/ --model name large-v3 Traceback (most recent...
I fine-tuned an llm based on the llama skeleton and used convert_hf_checkpoint and quantize to complete the quantification. However, when generating, the tokenizer.model file is missing. How can I operate...
I'm glad the torch.compile is speeding up very quickly. On A5000 it can speed up 60%, but there's no acceleration at l4. I want to know why is it happen?...
I cloned the gpt-fast repo, and tried it out with Llama-3, to setup I ran the following code: ```bash pip install huggingface_hub[hf_transfer] export HF_HUB_ENABLE_HF_TRANSFER=1 python3 -m pip install -r ./requirements.txt...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #180 Status: - Switched to DTensor based TP in regular tensor path - Result is correct, but there is a perf gap...