gpt-fast issues

Apple Silicon support?

3

Any plans to support Apple chips?

Unified Llama 3 (8b,70b) + Safetensors support

12

As discussed in #158 This PR unifies the support for llama 3 ~~however you must convert the model files from the safe tensors format to the PyTorch.bin format.~~ ~~Can be...

nivibilla

CLA Signed

I try to speed up with llava,but this it slower then eager mode,why?

1

[gpt-fast](https://github.com/pytorch-labs/gpt-fast) used torch.compile methods and achieved significant acceleration.so i wan’t change my llava model with torch.compile.llava and llama are similar，we need to use clip model processed image and then llama’s...

bleedingfight

[example] Add support for DBRX

1

Support [dbrx](https://huggingface.co/databricks/dbrx-base) from databricks. Initial perf numbers: ``` | | 1 GPU | 2 GPU | 4 GPU | 8 GPU | |------------------|---------|-----------|--------|------------| |baseline(bfloat16)| OOM | OOM | 59.53 |...

yanboliang

CLA Signed

int8 Woq raise Codegen Error with `--compile_prefill`

4

Hi Maintainers @yanboliang @Chillee , I encounter codegen error when using `--compile_prefile` in int8 Woq. Although it can still run, it could be confused to users. Could you please fix...

yanbing-j

Optimize Int8 Woq for CPU

2

This PR is to optimize Int8 Woq both in gpt-fast and mixtral-moe. At the current stage, we use `torch.ops.aten._weight_int8pack_mm` as an workaround. And this workaround will be removed when https://github.com/pytorch/pytorch/pull/120985...

yanbing-j

CLA Signed

[example] Added (hacky) Grok1 support

3

Downloading from https://huggingface.co/hpcai-tech/grok-1 ``` git clone --branch grok1 [email protected]:pytorch-labs/gpt-fast.git && cd gpt-fast/mixtral-moe export MODEL_REPO=hpcai-tech/grok-1 python scripts/download.py --repo_id $MODEL_REPO python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/$MODEL_REPO python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 TOKENIZERS_PARALLELISM=false ENABLE_INTRA_NODE_COMM=1...

Chillee

CLA Signed

intel gpu : enable intel gpu

2

This PR adds the initial Intel GPU support in GPT-fast with the device option "xpu" (i.e., --device "xpu"). Both single device and multi-device via tensor parallel are supported functionally while...

xiaowangintel

CLA Signed

Missing Keys in state_dict

2

I downloaded `nvidia/Llama3-ChatQA-1.5-8B` manually from HF into local. I ran `scripts/convert_hf_checkpoint.py` Then I wanted to run generate.py using the local checkpoint dir: ` raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(...

bjohn22

Making TokenizerInterface more usable for the user's code.

Adding id_to_piece, piece_to_id and is_special_token functionality to TokenizerInterface and the corresponding implementations. Thus, the interface can be used by user's code to encode/decode single tokens. These new functions are not...

Artyom17

CLA Signed

gpt-fast
gpt-fast copied to clipboard

Metadata

Apple Silicon support?

Unified Llama 3 (8b,70b) + Safetensors support

I try to speed up with llava,but this it slower then eager mode,why?

[example] Add support for DBRX

int8 Woq raise Codegen Error with `--compile_prefill`

Optimize Int8 Woq for CPU

[example] Added (hacky) Grok1 support

intel gpu : enable intel gpu

Missing Keys in state_dict

Making TokenizerInterface more usable for the user's code.

← Metadata

Owner

Metadata

gpt-fast gpt-fast copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-fast
gpt-fast copied to clipboard