GPTFast icon indicating copy to clipboard operation
GPTFast copied to clipboard

Accelerate your Hugging Face Transformers 6-7x. Native to Hugging Face and PyTorch.

Results 10 GPTFast issues
Sort by recently updated
recently updated
newest added

Huggingface -> Hugging Face

Very interesting work! I see you pinned `torch==2.1.2` . PyTorch 2.2 promises a 2x improvement to `scaled_dot_product_attention` and a few `torch.compile` improvements: https://pytorch.org/blog/pytorch2-2/ I don't think using PyTorch 2.2 will...

The current requirements makes it such as installing makes it incompatible with newer pytorch or transformers version They should be set to minimal requirements

Hello, I am having difficulties running GPTFast on Mistral-7b-v0.1, encountering the same errors as reported here: https://github.com/MDK8888/GPTFast/issues/25. My assumption is that the model_config is not set properly (I am currently...

Hi there, Thanks for creating this repo. I wanted to know what should be for Llama-2-7b-chat-hf if its the below for gpt and opt arhitectures ? ``` "gpt": { "path_to_blocks":...

Could you help to give an example code to run GPTFast on Mixtral-8x7B-Instruct-v0.1? I load the model with GPTFast with empty draft_model_name. Error shows when loading the model as following....

# to reproduce - `pip install gpt-fast` - run the code included the in readme - reinstalling numpy with  `!pip install numpy --upgrade` fixes the numpy error, but then there...

Dear Sir, I checked the demo code of GPTFast 0.2.1 and found that the function argmax_variation(...) is not used at all. Could you please expain for this ? Many thanks.

I am trying to use this project with a vision-language model like https://huggingface.co/docs/transformers/en/model_doc/llava_next but currently this repo does not support vision part of the model. I have a separate script...

Hi! I don't quite understand how this project works, I guess my main question is : `what is a draft model ? ` For example, I would like to speed-up...