ringattention issues

scripts/jax2hf. py error

1

HI I am trying to use the current script RingAttention main/scripts/jax2hf. py to convert the jax model to huggingface format, which comes from https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M-Jax/tree/main. But there was an error, how...

liuxpro

Questions about the paper

3

First, great work! I read the paper and had a few questions. * On p. 5, the paper says that minimal sequence length `s = 6c`, but where does this...

hiroshinoji

Incorrect project requirements

In the project requirements, it is specified that the version of `jax` is `0.4.13`. However, Pallas was added in the version `0.4.16` (https://github.com/google/jax/commit/d872812a359a3bafcfdeba1fcdb874ec77c209db).

hadipash

Test Script Issues

Hi Hao, First off, big thank you for the huge amount of work that has gone into open sourcing the implementation of your research, it is highly appreciated! While going...

djbyrne

vmem OOM on TPU

1

Hi, I tried to run your script on Cloud TPU v4-64, but failed with following error: `jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space vmem....

hxssgaa

joshpopelka20gmail

This work doesn't change kernel, but utilize dependency to compute a whole line?

Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness: This paper does not modify the kernel implementation but instead considers that...

ziyuhuang123

Could you provice GPU code like A100?

Hi! I am a researcher on GPU, could you provide GPU code? Thanks!

ziyuhuang123

ringattention
ringattention copied to clipboard

Metadata

scripts/jax2hf. py error

Questions about the paper

Incorrect project requirements

Test Script Issues

vmem OOM on TPU

Pretrained models?

JAX partitioning error when attempting to run with sequence parallelism factor not a power of 2

Llama 3 ring attention implementation for inference

This work doesn't change kernel, but utilize dependency to compute a whole line?

Could you provice GPU code like A100?

← Metadata

Owner

Metadata

ringattention ringattention copied to clipboard

Metadata

← Metadata

Owner

Metadata

ringattention
ringattention copied to clipboard