[Store] add tp awareness for get_tensor
Description
Type of Change
- Types
- [ ] Bug fix
- [ ] New feature
- [ ] Transfer Engine
- [x] Mooncake Store
- [ ] Mooncake EP
- [ ] Integration
- [ ] P2P Store
- [ ] Python Wheel
- [ ] Breaking change
- [ ] CI/CD
- [ ] Documentation update
- [ ] Other
How Has This Been Tested?
Checklist
- [ ] I have performed a self-review of my own code.
- [ ] I have updated the documentation.
- [ ] I have added tests to prove my changes are effective.
Summary of Changes
Hello @XucSh, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the Mooncake store's get_tensor functionality by integrating tensor parallelism awareness. This allows the system to efficiently retrieve specific portions of large tensors, which is crucial for distributed machine learning workloads. The changes involve modifying the core C++ implementation to handle tensor slicing and introducing new, thorough Python tests to validate this behavior across different splitting dimensions.
Highlights
-
Tensor Parallelism (TP) Support in
get_tensor: Theget_tensorfunction in the Mooncake store's Python interface (MooncakeStorePyWrapper) now supports optionaltp_rank,tp_size, andsplit_dimparameters. This allows the function to retrieve a specific slice of a tensor, enabling tensor parallelism in distributed training setups. -
Robustness Improvements: Added validation to ensure the retrieved data buffer is large enough to contain
TensorMetadata, preventing potential crashes with malformed data. An error path for unsupported dtypes now correctly deallocates memory. -
Comprehensive TP Testing: The
test_tensor_api.pyscript has been significantly updated to include a dedicated test suite for tensor parallelism. This suite verifies correct tensor slicing for both row (split_dim=0) and column (split_dim=1) parallelism, including shape validation, contiguity checks, and full tensor reconstruction. - Benchmark Enhancements: The benchmark script now generates 2D tensors with dimensions suitable for TP testing, and includes improved output formatting and error messages for better clarity during execution.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
Could we show performance data here? @XucSh
Could we show performance data here? @XucSh
@stmatengss There is a test result to get tensor with 64MB. The get_tensor will get the full size. and the get_tensor_with_tp only get 1/tp
TP Size | Metric | get_tensor (s) | get_tensor_with_tp (s) | Speedup
2 | Latency | 0.112739 | 0.025488 | 4.42x
4 | Latency | 0.051946 | 0.025382 | 2.05x
8 | Latency | 0.051199 | 0.026607 | 1.92x
Let us update the python API docs as well: docs/source/python-api-reference/mooncake-store.md
Could we show performance data here? @XucSh
@stmatengss There is a test result to get tensor with 64MB. The get_tensor will get the full size. and the get_tensor_with_tp only get 1/tp TP Size | Metric | get_tensor (s) | get_tensor_with_tp (s) | Speedup 2 | Latency | 0.112739 | 0.025488 | 4.42x 4 | Latency | 0.051946 | 0.025382 | 2.05x 8 | Latency | 0.051199 | 0.026607 | 1.92x
Got it. Will merge it after review
What I would like to see are some APIs like these.
int put_tensor_with_tp(const std::string &key, pybind11::object tensor,
int tp_size = 1, int tp_rank)
For each rank,
It can invoke put_tensor_with_tp(key, tensor_in_this_rank, tp_size, tp_rank)
We don't need to split this tensor in Mooncake side.