Sparsh Tewatia

Results 3 issues of Sparsh Tewatia

### The model to consider. https://huggingface.co/microsoft/Phi-3-medium-128k-instruct I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models use two rope scaling...

In Jax experimental pallas kernels for TPU , there is support for attn logits softcapping for paged attention but not for flash attention. If support can be added for pallas...

enhancement

**Describe the bug** I want to train a reward model using Easydel with sequence classification. The classifier has been implemented in the Flax sequence classifier classes for each model, but...