Mooncake
Mooncake copied to clipboard
[Performance]: Why don’t mooncake increase the number of threads issuing WQEs to improve the throughput when the block size is small?
Describe your performance question
When running ./transfer_engine_bench to transfer 4 KB blocks, the throughput is only 13.04 GB/s. If we increase the number of threads issuing WQEs, the throughput can be improved.
The experimental details are as follows: export MC_MAX_WR=2048
./transfer_engine_bench --mode=target --metadata_server=etcd://xxx --local_server_name=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --block_size=4096
./transfer_engine_bench --metadata_server=etcd://xxx --mode=initiator --segment_id=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --batch_size=128 --block_size=4096
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues and read the documentation
You can try to add more submission threads: --threads=16