perftest
perftest copied to clipboard
Perftest: fix ending conditions for non-standard -t/-q/-n values
It was revealed that in certain combination of tx_depth, connection number, cq_mod and iteration values the code wrongly determines ending conditions.
Especially, in the while loop condition a wrong comparison was used in which a mix of signed and unsigned variables are summed or subtracked without minding the resulting sign:
(ctx->scnt[index] - ctx->ccnt[index] + user_param->post_list) <= (user_param->tx_depth) In this case when scnt is lower than ccnt by more than post_list we end up with a high number compared to tx_depth just because the scnt is unsigned.
In another case, for multiple qp involved, when cq_mod is not a divider of iters we may end up in situation when: totccnt >= tot_iters and there is an i-th connection with ctx->ccnt[$i] < iters In such scenario the test will not be able to complete.
Commands for reproduction: ib_read_bw -R -s 1024 -n 10000 -q 5 -t 12 ib_send_bw -a -b -R -t 83 -r 41 OS: FreeBSD 12/13
Sponsored-by: Intel Corporation Signed-off-by: Bartosz Sobczak [email protected]
In this case when scnt is lower than ccnt
How can you get more completions than WQEs you have posted?
when cq_mod is not a divider of iters
I thought we fail if the number of iterations is not a multiple of post_list
, and if post_list
is not a multiple of cq_mod
and therefore iterations must be a multiple of cq_mod
. Can you please clarify?
In this case when scnt is lower than ccnt
How can you get more completions than WQEs you have posted?
During the testing i got this example:
run_iter_bw:3335 54 10000 48 1 12 7, 55 60
run_iter_bw:3335 55 10000 48 1 12 8, 56 60
run_iter_bw:3335 56 10000 48 1 12 9, 57 60
run_iter_bw:3335 57 10000 48 1 12 10, 58 60
run_iter_bw:3335 58 10000 48 1 12 11, 59 60
run_iter_bw:3335 59 10000 48 1 12 12, 60 60
run_iter_bw:3335 84 10000 96 1 12 18446744073709551605, 85 108
run_iter_bw:3335 85 10000 96 1 12 18446744073709551606, 86 108
run_iter_bw:3335 86 10000 96 1 12 18446744073709551607, 87 108
run_iter_bw:3335 87 10000 96 1 12 18446744073709551608, 88 108
run_iter_bw:3335 88 10000 96 1 12 18446744073709551609, 89 108
run_iter_bw:3335 89 10000 96 1 12 18446744073709551610, 90 108
run_iter_bw:3335 90 10000 96 1 12 18446744073709551611, 91 108
run_iter_bw:3335 91 10000 96 1 12 18446744073709551612, 92 108
run_iter_bw:3335 92 10000 96 1 12 18446744073709551613, 93 108
run_iter_bw:3335 93 10000 96 1 12 18446744073709551614, 94 108
run_iter_bw:3335 94 10000 96 1 12 18446744073709551615, 95 108
run_iter_bw:3335 95 10000 96 1 12 0, 96 108
run_iter_bw:3335 96 10000 96 1 12 1, 97 108
run_iter_bw:3335 97 10000 96 1 12 2, 98 108
and the values are: ctx->scnt[index], 84 user_param->iters, 10000 ctx->ccnt[index], 96 user_param->post_list, 1 user_param->tx_depth, 12 (ctx->scnt[index] - ctx->ccnt[index] + user_param->post_list), 18446744073709551605 (should be -11) (ctx->scnt[index] + user_param->post_list), 85 (user_param->tx_depth + ctx->ccnt[index]), 108
I agree the parameters are not that logical, as they should be in normal usage, but i do want to concentrate on the fact, that this is erronous treatment of unsigned variable and solely for that reason it should be fixed.
when cq_mod is not a divider of iters
I thought we fail if the number of iterations is not a multiple of
post_list
, and ifpost_list
is not a multiple ofcq_mod
and therefore iterations must be a multiple ofcq_mod
. Can you please clarify?
I see your point here, and this is logical statement. Although i didn't see that in any readme or help of the tool nor the input parameters are validated?
I agree the parameters are not that logical, as they should be in normal usage, but i do want to concentrate on the fact, that this is erronous treatment of unsigned variable and solely for that reason it should be fixed.
Yup, don't get me wrong, I'm okay with the change but the reason behind it didn't make sense to me.
nor the input parameters are validated?
See https://github.com/linux-rdma/perftest/blob/master/src/perftest_parameters.c#L1121-L1137. Although we might be missing the same check for user_param->recv_post_list
.
nor the input parameters are validated?
See https://github.com/linux-rdma/perftest/blob/master/src/perftest_parameters.c#L1121-L1137. Although we might be missing the same check for
user_param->recv_post_list
.
thanks, missed that, however, my example of failure shows 'post_list=1', so that is escaping these checks. and so my example of commands in commit description also tend to operate on tx_depth, but post_list is standrad (1)
Hi @bsobczak Thanks for your contribution! I understand the issues you are encountering and agree with you that it must be fixed. But I'm worried about the if else added in run_iter_bw because it may cause performance issues since its in middle of the performance measuring
Hi @bsobczak, When setting a duration greater then 1 the results are reset. command for example: ib_read_bw --duration 5
It's related in some way to the 'duration_param->state' that change to START_STATE during the run while it should be SAMPLE_STATE.
PR was merged with fixes