exllama issues

Codellama 16K context length?

3

Has anyone gotten 16k context length with codellama or llama2? because i have tried multiple models but they all start producing gibberish when the context window gets past 4096. I...

ShahZ181

remove tokens that exceed the max_seq_len

1

I want to remove tokens that exceed the max_seq_len. How can I achieve this functionality?

p11188536

KV caching?

2

Where is it being done in the code?

bryanhpchiang

Support for AMD ROCM

1

I have a machine with Mi25 GPUs. Would anybody like SSH access to develop on it for exllama?

yehowshuaradialrad

Fix half2 with HIP

9

The current version of CUDA allows you to access the component halfs of half2 through half2.x and half2.y, but in HIP x and y are unsigned shorts and not half...

Engininja2

Is it possible and efficient if load layer on demand?

2

I have a gpu that I want to load multiple model in it. Your exllama model is loading all weight to gpu after instantiate the `ExLlama`. Is it possible if...

fahadh4ilyas

Have you tried this yet? https://github.com/InternLM/lmdeploy On my initial testing for 7B and 13B models there's a noticeable per-token latency improvement (measured in time to generate the first 5 tokens).

bryanhpchiang

Any blogs on the project?

Trying to learn more about the optimizations.

qizzzh

Exllama tutorials?

23

I'm new to exllama, are there any tutorials on how to use this? I'm trying this with the llama-2 70b model.

NickDatLe

Codellama support

10

exllama/model.py", line 45, in __init__ self.pad_token_id = read_config["pad_token_id"] KeyError: 'pad_token_id'

lucasjinreal

exllama
exllama copied to clipboard

Metadata

Codellama 16K context length?

remove tokens that exceed the max_seq_len

KV caching?

Support for AMD ROCM

Fix half2 with HIP

Is it possible and efficient if load layer on demand?

Performance issues

Any blogs on the project?

Exllama tutorials?

Codellama support

← Metadata

Owner

Metadata

exllama exllama copied to clipboard

Metadata

← Metadata

Owner

Metadata

exllama
exllama copied to clipboard