[Performance]: Why does the bandwidth first increase and then decrease as the batch size increases?
Describe your performance question
As shown in the table, when running ./transfer_engine_bench on two H20 servers, the bandwidth initially increases and then decreases as the batch size increases.
The experimental details are as follows: export MC_WORKERS_PER_CTX=1
export MC_MAX_WR=2048
./transfer_engine_bench --mode=target --metadata_server=etcd://xxx --local_server_name=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --block_size=8192
./transfer_engine_bench --metadata_server=etcd://xxx --mode=initiator --segment_id=abcd:12345 --device_name=mlx5_bond_1 --threads=8 --batch_size=xx --block_size=8192
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues and read the documentation
I think this is because the time complexity of the code below is O(n^2 * m), where n is the batch size. I would like to know why it is designed this way. From my understanding, the time complexity of polling the CQ should only be O(n * m).
@hzt123123 We provide an API getTransferStatus(batch_id, status) to aggregate the result of all tasks.
In addition, as you can see in this code, we will wait for all tasks are completed before submitting next batch of tasks (rather than pipelining). So excessive tasks may lead the main thread to wait for more time.
Why is getBatchTransferStatus included in the API getTransferStatus?
Why is getBatchTransferStatus included in the API getTransferStatus?
![]()
TE Transfer is a batch-based API, so we need to retrieve the status of the entire batch.