Leyang Xue

Results 11 issues of Leyang Xue

Howard, Heidi, and Richard Mortier. "Paxos vs Raft: Have we reached consensus on distributed consensus?." Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data. 2020

area/distributed-systems
TODO-未读
type/paper

- release experts parallel version - correct README - support arctic and grok - remove installation dependency - remove circular dependency issue

- [x] API design - [x] Document for installation and PyPI - [x] performance table - [x] Support Mixtral multi-GPU - [ ] Load trace

Colab server T4 has 12GB DRAM, 16GB GPU, quantized mixtral has 26GB in size with single checkpoint, cannot bot be loaded into memory on creating the custom format for offloading

enhancement

Format on PR to main branch - use [pre-commit hooks](https://github.com/pre-commit/pre-commit) - use DeepSpeed github [workflow](https://github.com/microsoft/DeepSpeed/blob/master/.github/workflows/formatting.yml) Developer need to run `pre-commit run --all-files` before PR

## Description Major changes for performance improvement ## Motivation - Support latest QWen3 MoE model - Overlap hidden states gather with expert copy - Reduce torch kernel launch overhead ##...

### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description Current sllm store only shares parameter using GPU handle, it would be more beneficial if...

### Prerequisites - [x] I have searched existing issues and reviewed documentation. ### Problem Description Model reading from disk slow, achieve only 2GB/s on 12GB/s SSD ### Proposed Solution 1....

enhancement

## Description Fuse MoE layer kernels ## Motivation Kernel launch overhead too large ## Type of Change - [ ] Bug fix - [x] New feature - [x] Breaking change...