ChunkLlama issues

Does it supprts batch inference?

1

Hi guys, thank you for this excellent work! It seems that this code does not consider the attention_mask during chunkllama inference. Does this code support batch inference?

SkyAndCloud

Tanks for your Research, Can you provide a CUDA version code?

4

Exciting work, I am very interested, but since my coding ability is weak, can you provide a CUDA code about DCA, it will be greatly appreciated

bobbych94

How do I use it in vllm deployment

6

How can I use this approach in vllm deployment without training，can you give me a specific example. thx

jchang98

finetune error

2

`Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:04

MarsMeng1994

info about compared models in your paper

2

I noticed that you compared many models in your paper. Could you please share the implementation code for the training-free models mentioned in Table 1, or the GitHub repository you...

Jihuai-wpy

Fix typo: MyMuPDF => PyMuPDF

This PR fixes a single-character typo in the library name. | Before (MyMuPDF does not exist) | After (using correct name PyMuPDF) | | - | - | | |...

dikarel

A bug when run run_chunkllama_100k with flash decoding

3

When i use Llama3 (with flash decoding ) to run run_chunkllama_100k, and it can successful start. But when i input prompt. then encounter a TypeError that: ``` File "ChunkLlama/flash_decoding_chunkllama.py", line...

lwj2001

OOM error when inferencing 128k context on 4 GPUs

4

Hello, Thank you for providing DCA to scale model context up to 100K+. However, I encountered an issue when trying to inference with a 128k context using 4 GPUs. The...

HoBeedzc

A Bug in flash_decoding_chunkllama.py

1

Have not from transformers.modeling_attn_mask_utils import **_prepare_4d_causal_attention_mask_for_sdpa** flash_decoding_chunkllama.py: - Ln 510: attention_mask = _prepare_4d_causal_attention_mask_for_sdpa( - Ln7: from transformers.modeling_attn_mask_utils import _prepare_4d_causal_attention_mask

ZayIsAllYouNeed

ChunkLlama
ChunkLlama copied to clipboard

Metadata

Does it supprts batch inference?

Tanks for your Research, Can you provide a CUDA version code?

How do I use it in vllm deployment

finetune error

fix filename error

info about compared models in your paper

Fix typo: MyMuPDF => PyMuPDF

A bug when run run_chunkllama_100k with flash decoding

OOM error when inferencing 128k context on 4 GPUs

A Bug in flash_decoding_chunkllama.py

← Metadata

Owner

Metadata

ChunkLlama ChunkLlama copied to clipboard

Metadata

← Metadata

Owner

Metadata

ChunkLlama
ChunkLlama copied to clipboard