InternEvo
InternEvo copied to clipboard
[Feature] CPU synchronization Problem
Describe the feature
Some CPU synchronizations block the GPU kernel, leading to bubbles between GPU kernels. It should be optimized in the future.
- item() in rotary embedding.
- moe_loss construction.
Will you implement it?
- [ ] I would like to implement this feature and create a PR!