gemma-2B-10M icon indicating copy to clipboard operation
gemma-2B-10M copied to clipboard

Gemma 2B with 10M context length using Infini-attention.

Results 9 gemma-2B-10M issues
Sort by recently updated
recently updated
newest added

I made a colab( https://colab.research.google.com/drive/1Z3NdoT0WS8KXnSUS3_xxT39NBZD6eGcN?usp=sharing ) to test and I ran into some issue. GemmaModel.forward() got an unexpected keyword argument 'cache_position'. I had to change some of the main.py to...

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz) After I removed the mention of flash_atn in gemma.py, I got the following errors: `TypeError:...

I don't quite understand how to install and run it. I downloaded this folder from github, and downloaded all 13 files from hugging face. What's next, in which folder should...

Congratulations on this super-exciting project! It would be awesome to top it up with a live Gradio demo on [Huggingface Spaces](https://huggingface.co/spaces). I think this could help with more community engagement...

```generate()``` in main.py seems only processes the last 2048 tokens of the input prompt ? https://github.com/mustafaaljadery/gemma-2B-10M/blob/cb97c2f686a41d4d54c259437dcdcd4f7f8da5f0/src/main.py#L15C9-L15C54 If prompt is entered with a length greater than 2048, then writing generate seems...

Hi, really exciting to see 10M context window. But I don't have 32G memory. Can I limit the context window to 100k to reduce the required memory to be fit...

Hi Can this be finetuned with LoRA without any additional script. Also, during finetuning, if we take sequence length of 512 or 1k, will it affect the inference for higher...

The code for the model provided in this repository seems to be a copy of the repository linked below: https://github.com/Beomi/InfiniTransformer/blob/main/infini_gemma/modeling_infini_gemma.py Specifically, the GemmaInifniAttention and GemmaModel seem to be a direct...

For local hardware usage 24G is an interesting number (NVIDIA 3090 etc.). Can you give some idea on what folks might expect from this code on such hardware?