Mooncake icon indicating copy to clipboard operation
Mooncake copied to clipboard

[Performance]: Why don’t mooncake increase the number of threads issuing WQEs to improve the throughput when the block size is small?

Open hzt123123 opened this issue 1 month ago • 1 comments

Describe your performance question

When running ./transfer_engine_bench to transfer 4 KB blocks, the throughput is only 13.04 GB/s. If we increase the number of threads issuing WQEs, the throughput can be improved.

The experimental details are as follows: export MC_MAX_WR=2048

./transfer_engine_bench --mode=target --metadata_server=etcd://xxx --local_server_name=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --block_size=4096

./transfer_engine_bench --metadata_server=etcd://xxx --mode=initiator --segment_id=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --batch_size=128 --block_size=4096

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues and read the documentation

hzt123123 avatar Nov 22 '25 04:11 hzt123123

You can try to add more submission threads: --threads=16

alogfans avatar Nov 24 '25 01:11 alogfans