gpt-fast
gpt-fast copied to clipboard
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Any plans to support Apple chips?
As discussed in #158 This PR unifies the support for llama 3 ~~however you must convert the model files from the safe tensors format to the PyTorch.bin format.~~ ~~Can be...
[gpt-fast](https://github.com/pytorch-labs/gpt-fast) used torch.compile methods and achieved significant acceleration.so i wan’t change my llava model with torch.compile.llava and llama are similar,we need to use clip model processed image and then llama’s...
Support [dbrx](https://huggingface.co/databricks/dbrx-base) from databricks. Initial perf numbers: ``` | | 1 GPU | 2 GPU | 4 GPU | 8 GPU | |------------------|---------|-----------|--------|------------| |baseline(bfloat16)| OOM | OOM | 59.53 |...
Hi Maintainers @yanboliang @Chillee , I encounter codegen error when using `--compile_prefile` in int8 Woq. Although it can still run, it could be confused to users. Could you please fix...
This PR is to optimize Int8 Woq both in gpt-fast and mixtral-moe. At the current stage, we use `torch.ops.aten._weight_int8pack_mm` as an workaround. And this workaround will be removed when https://github.com/pytorch/pytorch/pull/120985...
Downloading from https://huggingface.co/hpcai-tech/grok-1 ``` git clone --branch grok1 [email protected]:pytorch-labs/gpt-fast.git && cd gpt-fast/mixtral-moe export MODEL_REPO=hpcai-tech/grok-1 python scripts/download.py --repo_id $MODEL_REPO python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/$MODEL_REPO python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 TOKENIZERS_PARALLELISM=false ENABLE_INTRA_NODE_COMM=1...
This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while...
I downloaded `nvidia/Llama3-ChatQA-1.5-8B` manually from HF into local. I ran `scripts/convert_hf_checkpoint.py` Then I wanted to run generate.py using the local checkpoint dir: ` raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(...
Adding id_to_piece, piece_to_id and is_special_token functionality to TokenizerInterface and the corresponding implementations. Thus, the interface can be used by user's code to encode/decode single tokens. These new functions are not...