EasyDeL
EasyDeL copied to clipboard
Performance Optimization Opportunities
Description
While reviewing the codebase, I identified several areas where performance could be improved and memory usage reduced by leveraging int1 kernels or boolean operations. These optimizations can be particularly beneficial in scenarios involving bit-level operations, compact data storage, or high-frequency computations.
Next Steps:
- [ ] Benchmark the changes to quantify performance gains.
- [ ] Submit a pull request with the optimizations.
- [ ] Research and prototype
int1or boolean-based implementations. - [ ] for some operations we can use CUDA kernel for better performance instead of triton (e.g RMSNorm, MMU, ...)