ringattention
ringattention copied to clipboard
Transformers with Arbitrarily Large Context
HI I am trying to use the current script RingAttention main/scripts/jax2hf. py to convert the jax model to huggingface format, which comes from https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M-Jax/tree/main. But there was an error, how...
First, great work! I read the paper and had a few questions. * On p. 5, the paper says that minimal sequence length `s = 6c`, but where does this...
In the project requirements, it is specified that the version of `jax` is `0.4.13`. However, Pallas was added in the version `0.4.16` (https://github.com/google/jax/commit/d872812a359a3bafcfdeba1fcdb874ec77c209db).
Hi Hao, First off, big thank you for the huge amount of work that has gone into open sourcing the implementation of your research, it is highly appreciated! While going...
Hi, I tried to run your script on Cloud TPU v4-64, but failed with following error: `jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space vmem....
Hi there, I am working on long context model. Is it possible to have the pretrained models?
I'm trying to run Ring Attention on a machine with 6 A100 GPUs, and I'm finding that when I try to set the sequence parallelism dimension to anything other than...
Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do...
Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness: This paper does not modify the kernel implementation but instead considers that...
Hi! I am a researcher on GPU, could you provide GPU code? Thanks!