Andre Slavescu
Andre Slavescu
*Description of changes:* - added rich (version==13.3.2) to requirements.txt (experienced dependency conflict without) - added z-loss
reference to issue https://github.com/vllm-project/vllm/issues/198
Just as a simple suggestion, I'm wondering if there can be a discussion tab to post problems with running the system as opposed to posting individual issues. This can keep...
gelu implementation. process in chunks + usage of hardware intrinsics for slightly faster performance. Results: (original implementation) GPU cuda_gelu execution times: [13.78, 13.26, 13.95, 13.0, 12.93, 13.46] (second implementation) GPU...
## Goal: - decide which optimizers are worth implementing in a priority order ## Remaining Optimizers + General Usage from my understanding there are really only 2 from the ones...
Would it be an idea to define the same kernels that exist in the CUDA backend with ThunderKittens as well? They have cool examples with FlashAttention2 and I think it...
- [x] Restructure Makefile (automate detection of compute capability) - [ ] Optimize existing kernels
ref: https://github.com/castorini/ura-projects/issues/39