gpt-fast issues

Fixing quantize in int4 mode

4

Int4 quantization requires CUDA device, however, in current impl --device param was overridden with 'cpu' unconditionally.

Artyom17

CLA Signed

Tensor Parallel Inside notebook

3

Hi, Im trying to get an example working with Ray on Databricks. Essentially having multiple replicas of the model. Is it possible to load a model with tensor parallelism inside...

nivibilla

Llama3 8b perf numbers on A100

Perf numbers of Llama3-8B implementation added by https://github.com/pytorch-labs/gpt-fast/pull/158

yanboliang

CLA Signed

mmap issue in bf16 of gpt-fast

1

gpt-fast will use `torch.load` with `mmap=True` to load checkpoints of models. This may help speed up model load time. However, eventually, mmap is not used in bf16, because in https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247,...

yanbing-j

Remove used empty variable

2

Improve the code quality, the `empty` variable is redundant. This change removes it

yncxcw

CLA Signed

INT4 quantization not working on MI210

1

INT8 quantization works fine, but INT4 does not work. ![Capture](https://github.com/pytorch-labs/gpt-fast/assets/106262476/ac10df53-860e-4da9-b51e-1ad17e3fe3c4)

yafehlis

Add download script for tinyllamas

2

Download the tinyllamas' weight from https://huggingface.co/karpathy/tinyllamas/tree/main Download the tinyllamas' `tokenizer.model` from https://github.com/karpathy/llama2.c/raw/master/tokenizer.model

yiliu30

CLA Signed

Naming: n_local_heads -> n_kv_heads

`n_local_heads` refers to TP sharding, rather than GQA.

ad8e

fix input_pos shape in comment

2

Currently the code only supports bs=1 with input_pos being one dimensional. This fixes input_pos shape in the comments.

YassineYousfi

CLA Signed

testing HQQ [not for land]

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #155 Summary: hqq wikitext: {'word_perplexity,none': 12.698986130023261, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.6084602387562144, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6856802729143467, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} not hqq wikitext: {'word_perplexity,none':...

HDCharles

CLA Signed

gpt-fast
gpt-fast copied to clipboard

Metadata

Fixing quantize in int4 mode

Tensor Parallel Inside notebook

Llama3 8b perf numbers on A100

mmap issue in bf16 of gpt-fast

Remove used empty variable

INT4 quantization not working on MI210

Add download script for tinyllamas

Naming: n_local_heads -> n_kv_heads

fix input_pos shape in comment

testing HQQ [not for land]

← Metadata

Owner

Metadata

gpt-fast gpt-fast copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-fast
gpt-fast copied to clipboard