gpt-fast issues

Expert parallelism / MoE example would be awesome :)

1

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like...

andersonbcdefg

Support code gen for non-cuda targets with gpt-fast

3

Extend existing device variable to support code gen for other targets. New args to generate -- use_sdpa # by default, SDPA is disabled for best performance on CUDA. CPU presents...

mikekgfb

CLA Signed

this PR fixes the checkpoint conversion scripts for the `.safetensors` checkpoints on https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 This is necessary because 1. safetensors are loaded differently from pt files 2. the `.pt` files and...

152334H

CLA Signed

What is `torch.ops.aten._convert_weight_to_int4pack` ?

4

I'm using torch.__version__ = 2.1.0a0+32f93b1 which doesn't have AttributeError: '_OpNamespace' 'aten' object has no attribute '_convert_weight_to_int4pack' What exactly does this do, and is it defined elsewhere? Unfortunately upgrading to the...

vgoklani

Support ScalingRotaryEmbedding

Added a scaling_factor to the rotary embedding calculation. This is for use with models like [DeepSeek](https://github.com/deepseek-ai/). DeepSeek uses LlamaLinearScalingRotaryEmbedding. The only difference is that the freqs in precompute_freqs_cis are divided...

briandw

CLA Signed

Add mixtral support

2

Some preliminary perf numbers: TP=8, fp16, 163.69 tok/s

Chillee

CLA Signed

How to cache the compilation result?

2

`torch.compile` always re-compiles a function from scratch in a new Python session, which takes a lot of time. I'm wondering if there's a way to cache the compilation result in...

huntzhan

pytorch版本问题，运行整套流程torch版本需要特定的版本吗？还是说2.1.0以上就可以

3

Joker-sad

Bug convert HF model

3

**Bug Report** **Description:** I encountered a bug when attempting to convert a model from Hugging Face (HF) using the provided code implementation. The issue appears to be related to counting...

vinhtran2611

gpt-fast
gpt-fast copied to clipboard

Metadata

Expert parallelism / MoE example would be awesome :)

Support code gen for non-cuda targets with gpt-fast

blip2 can be supportted??

fix safetensors

What is `torch.ops.aten._convert_weight_to_int4pack` ?

Support ScalingRotaryEmbedding

Add mixtral support

How to cache the compilation result?

pytorch版本问题，运行整套流程torch版本需要特定的版本吗？还是说2.1.0以上就可以

Bug convert HF model

← Metadata

Owner

Metadata

gpt-fast gpt-fast copied to clipboard

Metadata

← Metadata

Owner

Metadata

gpt-fast
gpt-fast copied to clipboard