Mooler0410 comments

Results 11 comments of


                                            Mooler0410

[Feature] Support for SelfExtend-style context expansion

> As the paper mentioned, self-Extend do not support flash-attn. We recently added flash-attention support for Selfextend

Add Self-Extend to the gemma.cpp

Author here, glad to answer any questions about details for our work.

Question | Has anyone tried this with GGUF models?

Llama.cpp has supported SelfExtend and had a good implementation. It uses GGUF models. SelfExtend has obtained pretty positive feedback from Llama.cpp's community. You can check their repo for more details.

Question about equation 4 and Table 5 caption in paper

Hi! We have some empirical results about this. You may check out this link: [https://github.com/datamllab/LongLM?tab=readme-ov-file#3how-to-choose-the-group_size-and-neighbor_window](https://github.com/datamllab/LongLM?tab=readme-ov-file#3how-to-choose-the-group_size-and-neighbor_window). Hope this may help!

Question about equation 4 and Table 5 caption in paper

If you are asking why we use this setting for 4k, actually, we just selected the two parameters arbitrarily as long as it works well and we never considered whether...

What effect on qwen1.5 will be if i use self-extend trick?

We believe how good is self extend highly depends on how good is the extended model within its original pretraining context window. This means if Qwen1.5's 32k context window is...

Support with vLLM

We are not very familiar with vLLM and its internal mechanism. We will check its compatibility with SelfExtend. Thanks for your suggestion!

llama3 is not working.

> I followed your direction like the below to apply selfextend to llama3 > > """ > > [04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it...

will the flash attention embed self-extend?

Hi! We just implemented FlashAttention for self-extend utilizing the window FA supported by flash_attn. In a word, we merge two FA together to get the attention of self-extend. Check https://github.com/datamllab/LongLM/pull/28...

Unexpected low performance of mistral-v0.1

> v0.1 only supports 8K token length, which leads to low performance. We use v0.2 because it supports 32K tokens. The first 3 subsets of BANKING77 are below 8k. So,...