Ahmed Elnaggar comments

Results 41 comments of


                                            Ahmed Elnaggar

TPU deadlock

I have tested the above script using GPUs and it is working without any issue. The long compilation process only occurs with TPUs.

TPU deadlock

Thanks a lot @skye for your explanation and support. I will use the GPUs for now and I hope the TPU issue will be solved in the near future.

Feature request: FlashAttention

+1 Any update on this requst ?

Feature request: FlashAttention

> > `jax.experimental` has an [implementation](https://github.com/google/jax/blob/main/jax/experimental/pallas/ops/tpu/flash_attention.py) of FlashAttention, written by Pallas kernels and therefore usable in both GPU and TPU. > > We can probably upstream this to Flax attention...

Feature request: FlashAttention

> `jax.experimental` has an [implementation](https://github.com/google/jax/blob/main/jax/experimental/pallas/ops/tpu/flash_attention.py) of FlashAttention, written by Pallas kernels and therefore usable in both GPU and TPU. > > We can probably upstream this to Flax attention if...

Feature request: FlashAttention

One more point Jax has two implementations: 1. Fused attention: https://github.com/google/jax/blob/main/jax/experimental/pallas/ops/attention.py 2. Flash attention: https://github.com/google/jax/blob/main/jax/experimental/pallas/ops/tpu/flash_attention.py