Sparsh Tewatia
Sparsh Tewatia
### The model to consider. https://huggingface.co/microsoft/Phi-3-medium-128k-instruct I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models use two rope scaling...
In Jax experimental pallas kernels for TPU , there is support for attn logits softcapping for paged attention but not for flash attention. If support can be added for pallas...
**Describe the bug** I want to train a reward model using Easydel with sequence classification. The classifier has been implemented in the Flax sequence classifier classes for each model, but...