Mooncake [Store] add tp awareness for get

Description

Type of Change

Types
- [ ] Bug fix
- [ ] New feature
  - [ ] Transfer Engine
  - [x] Mooncake Store
  - [ ] Mooncake EP
  - [ ] Integration
  - [ ] P2P Store
  - [ ] Python Wheel
- [ ] Breaking change
- [ ] CI/CD
- [ ] Documentation update
- [ ] Other

How Has This Been Tested?

Checklist

[ ] I have performed a self-review of my own code.
[ ] I have updated the documentation.
[ ] I have added tests to prove my changes are effective.

Nov 27 '25 07:11 XucSh

Summary of Changes

Hello @XucSh, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Mooncake store's get_tensor functionality by integrating tensor parallelism awareness. This allows the system to efficiently retrieve specific portions of large tensors, which is crucial for distributed machine learning workloads. The changes involve modifying the core C++ implementation to handle tensor slicing and introducing new, thorough Python tests to validate this behavior across different splitting dimensions.

Highlights

Tensor Parallelism (TP) Support in get_tensor: The get_tensor function in the Mooncake store's Python interface (MooncakeStorePyWrapper) now supports optional tp_rank, tp_size, and split_dim parameters. This allows the function to retrieve a specific slice of a tensor, enabling tensor parallelism in distributed training setups.
Robustness Improvements: Added validation to ensure the retrieved data buffer is large enough to contain TensorMetadata, preventing potential crashes with malformed data. An error path for unsupported dtypes now correctly deallocates memory.
Comprehensive TP Testing: The test_tensor_api.py script has been significantly updated to include a dedicated test suite for tensor parallelism. This suite verifies correct tensor slicing for both row (split_dim=0) and column (split_dim=1) parallelism, including shape validation, contiguity checks, and full tensor reconstruction.
Benchmark Enhancements: The benchmark script now generates 2D tensors with dimensions suitable for TP testing, and includes improved output formatting and error messages for better clarity during execution.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Nov 27 '25 07:11 gemini-code-assist[bot]

Could we show performance data here? @XucSh

Nov 28 '25 02:11 stmatengss

Could we show performance data here? @XucSh

@stmatengss There is a test result to get tensor with 64MB. The get_tensor will get the full size. and the get_tensor_with_tp only get 1/tp TP Size | Metric | get_tensor (s) | get_tensor_with_tp (s) | Speedup
2 | Latency | 0.112739 | 0.025488 | 4.42x 4 | Latency | 0.051946 | 0.025382 | 2.05x
8 | Latency | 0.051199 | 0.026607 | 1.92x

Dec 01 '25 02:12 XucSh

Let us update the python API docs as well: docs/source/python-api-reference/mooncake-store.md

Dec 01 '25 02:12 ykwd

Could we show performance data here? @XucSh

@stmatengss There is a test result to get tensor with 64MB. The get_tensor will get the full size. and the get_tensor_with_tp only get 1/tp TP Size | Metric | get_tensor (s) | get_tensor_with_tp (s) | Speedup 2 | Latency | 0.112739 | 0.025488 | 4.42x 4 | Latency | 0.051946 | 0.025382 | 2.05x 8 | Latency | 0.051199 | 0.026607 | 1.92x

Got it. Will merge it after review

Dec 01 '25 08:12 stmatengss

What I would like to see are some APIs like these.

int put_tensor_with_tp(const std::string &key, pybind11::object tensor,
                           int tp_size = 1, int tp_rank)
For each rank,
It can invoke put_tensor_with_tp(key, tensor_in_this_rank, tp_size, tp_rank)

We don't need to split this tensor in Mooncake side.

Dec 01 '25 08:12 stmatengss

[Store] add tp awareness for get_tensor

Description

Type of Change

How Has This Been Tested?

Checklist

Summary of Changes

Highlights