perftest icon indicating copy to clipboard operation
perftest copied to clipboard

Perftest: fix ending conditions for non-standard -t/-q/-n values

Open bsobczak opened this issue 2 years ago • 6 comments

It was revealed that in certain combination of tx_depth, connection number, cq_mod and iteration values the code wrongly determines ending conditions.

Especially, in the while loop condition a wrong comparison was used in which a mix of signed and unsigned variables are summed or subtracked without minding the resulting sign:

(ctx->scnt[index] - ctx->ccnt[index] + user_param->post_list) <= (user_param->tx_depth) In this case when scnt is lower than ccnt by more than post_list we end up with a high number compared to tx_depth just because the scnt is unsigned.

In another case, for multiple qp involved, when cq_mod is not a divider of iters we may end up in situation when: totccnt >= tot_iters and there is an i-th connection with ctx->ccnt[$i] < iters In such scenario the test will not be able to complete.

Commands for reproduction: ib_read_bw -R -s 1024 -n 10000 -q 5 -t 12 ib_send_bw -a -b -R -t 83 -r 41 OS: FreeBSD 12/13

Sponsored-by: Intel Corporation Signed-off-by: Bartosz Sobczak [email protected]

bsobczak avatar Jul 05 '22 06:07 bsobczak

In this case when scnt is lower than ccnt

How can you get more completions than WQEs you have posted?

when cq_mod is not a divider of iters

I thought we fail if the number of iterations is not a multiple of post_list, and if post_list is not a multiple of cq_mod and therefore iterations must be a multiple of cq_mod. Can you please clarify?

firasj avatar Jul 05 '22 16:07 firasj

In this case when scnt is lower than ccnt

How can you get more completions than WQEs you have posted?

During the testing i got this example:

run_iter_bw:3335 54 10000 48 1 12 7, 55 60
run_iter_bw:3335 55 10000 48 1 12 8, 56 60
run_iter_bw:3335 56 10000 48 1 12 9, 57 60
run_iter_bw:3335 57 10000 48 1 12 10, 58 60
run_iter_bw:3335 58 10000 48 1 12 11, 59 60
run_iter_bw:3335 59 10000 48 1 12 12, 60 60
run_iter_bw:3335 84 10000 96 1 12 18446744073709551605, 85 108
run_iter_bw:3335 85 10000 96 1 12 18446744073709551606, 86 108
run_iter_bw:3335 86 10000 96 1 12 18446744073709551607, 87 108
run_iter_bw:3335 87 10000 96 1 12 18446744073709551608, 88 108
run_iter_bw:3335 88 10000 96 1 12 18446744073709551609, 89 108
run_iter_bw:3335 89 10000 96 1 12 18446744073709551610, 90 108
run_iter_bw:3335 90 10000 96 1 12 18446744073709551611, 91 108
run_iter_bw:3335 91 10000 96 1 12 18446744073709551612, 92 108
run_iter_bw:3335 92 10000 96 1 12 18446744073709551613, 93 108
run_iter_bw:3335 93 10000 96 1 12 18446744073709551614, 94 108
run_iter_bw:3335 94 10000 96 1 12 18446744073709551615, 95 108
run_iter_bw:3335 95 10000 96 1 12 0, 96 108
run_iter_bw:3335 96 10000 96 1 12 1, 97 108
run_iter_bw:3335 97 10000 96 1 12 2, 98 108

and the values are: ctx->scnt[index], 84 user_param->iters, 10000 ctx->ccnt[index], 96 user_param->post_list, 1 user_param->tx_depth, 12 (ctx->scnt[index] - ctx->ccnt[index] + user_param->post_list), 18446744073709551605 (should be -11) (ctx->scnt[index] + user_param->post_list), 85 (user_param->tx_depth + ctx->ccnt[index]), 108

I agree the parameters are not that logical, as they should be in normal usage, but i do want to concentrate on the fact, that this is erronous treatment of unsigned variable and solely for that reason it should be fixed.

when cq_mod is not a divider of iters

I thought we fail if the number of iterations is not a multiple of post_list, and if post_list is not a multiple of cq_mod and therefore iterations must be a multiple of cq_mod. Can you please clarify?

I see your point here, and this is logical statement. Although i didn't see that in any readme or help of the tool nor the input parameters are validated?

bsobczak avatar Jul 18 '22 12:07 bsobczak

I agree the parameters are not that logical, as they should be in normal usage, but i do want to concentrate on the fact, that this is erronous treatment of unsigned variable and solely for that reason it should be fixed.

Yup, don't get me wrong, I'm okay with the change but the reason behind it didn't make sense to me.

nor the input parameters are validated?

See https://github.com/linux-rdma/perftest/blob/master/src/perftest_parameters.c#L1121-L1137. Although we might be missing the same check for user_param->recv_post_list.

firasj avatar Jul 18 '22 13:07 firasj

nor the input parameters are validated?

See https://github.com/linux-rdma/perftest/blob/master/src/perftest_parameters.c#L1121-L1137. Although we might be missing the same check for user_param->recv_post_list.

thanks, missed that, however, my example of failure shows 'post_list=1', so that is escaping these checks. and so my example of commands in commit description also tend to operate on tx_depth, but post_list is standrad (1)

bsobczak avatar Jul 18 '22 13:07 bsobczak

Hi @bsobczak Thanks for your contribution! I understand the issues you are encountering and agree with you that it must be fixed. But I'm worried about the if else added in run_iter_bw because it may cause performance issues since its in middle of the performance measuring

HassanKhadour avatar Aug 03 '22 14:08 HassanKhadour

Hi @bsobczak, When setting a duration greater then 1 the results are reset. command for example: ib_read_bw --duration 5

It's related in some way to the 'duration_param->state' that change to START_STATE during the run while it should be SAMPLE_STATE.

sshaulnv avatar Aug 09 '22 14:08 sshaulnv

PR was merged with fixes

HassanKhadour avatar Jan 14 '24 12:01 HassanKhadour