rdma-core efa: Fix queue creation size uint32

Fix possible overflow when calculating sq_desc_cnt and rq_desc_cnt and fallback to QP_DESC_COUNT_MAX=BIT(31) which is the maximum power of 2 in a uint32_t variable.

Signed-off-by: Shuki Zanyovka [email protected] Change-Id: If8041a9845d7460d04c7e14a7d738230465d2276

Sep 14 '22 09:09 shuki-zanyovka

Please remove ChangeID

Sep 21 '22 09:09 rleon

It is my third time that I'm asking you to remove ChangeID.

Sep 21 '22 11:09 rleon

It is my third time that I'm asking you to remove ChangeID.

Fixed.

Sep 21 '22 11:09 shuki-zanyovka

@rleon Our initial intention was making efa_setup_qp() more robust as its current implementation has internal calculations that, although very unlikely, can be tackled by external (seemingly valid) inputs. Agree with the approach of keeping the code lean as possible but we should try to avoid inexplicit failure behavior as well. @shuki-zanyovka suggestion was adding a "roundup_and_limit" function to improve robustness with minimal compromise on simplicity.

Oct 06 '22 09:10 mrgolin

@rleon Our initial intention was making efa_setup_qp() more robust as its current implementation has internal calculations that, although very unlikely, can be tackled by external (seemingly valid) inputs. Agree with the approach of keeping the code lean as possible but we should try to avoid inexplicit failure behavior as well. @shuki-zanyovka suggestion was adding a "roundup_and_limit" function to improve robustness with minimal compromise on simplicity.

And what will it give? rdma-core runs as part of user application, mistake there will cause to segfault in application. Why is it different from any other user mistakes that one can do?

Oct 06 '22 10:10 rleon

@rleon Our initial intention was making efa_setup_qp() more robust as its current implementation has internal calculations that, although very unlikely, can be tackled by external (seemingly valid) inputs. Agree with the approach of keeping the code lean as possible but we should try to avoid inexplicit failure behavior as well. @shuki-zanyovka suggestion was adding a "roundup_and_limit" function to improve robustness with minimal compromise on simplicity.

And what will it give? rdma-core runs as part of user application, mistake there will cause to segfault in application. Why is it different from any other user mistakes that one can do?

@rleon , The mistake in this case will result in a wrong value sent to the EFA firmware and not a segfault / error or whatever in the application. Please see the last comment above.

Oct 06 '22 10:10 shuki-zanyovka

@rleon Our initial intention was making efa_setup_qp() more robust as its current implementation has internal calculations that, although very unlikely, can be tackled by external (seemingly valid) inputs. Agree with the approach of keeping the code lean as possible but we should try to avoid inexplicit failure behavior as well. @shuki-zanyovka suggestion was adding a "roundup_and_limit" function to improve robustness with minimal compromise on simplicity.

And what will it give? rdma-core runs as part of user application, mistake there will cause to segfault in application. Why is it different from any other user mistakes that one can do?

@rleon , The mistake in this case will result in a wrong value sent to the EFA firmware and not a segfault / error or whatever in the application. Please see the last comment above.

I saw the comment and just gave more severe outcome.

The wrong values that were sent to FW can happen anytime, as I can use my own version of rdma-core (without any checks) and my own kernel. Your FW must be protected from such flows.

Oct 06 '22 10:10 rleon

@rleon Our initial intention was making efa_setup_qp() more robust as its current implementation has internal calculations that, although very unlikely, can be tackled by external (seemingly valid) inputs. Agree with the approach of keeping the code lean as possible but we should try to avoid inexplicit failure behavior as well. @shuki-zanyovka suggestion was adding a "roundup_and_limit" function to improve robustness with minimal compromise on simplicity.

And what will it give? rdma-core runs as part of user application, mistake there will cause to segfault in application. Why is it different from any other user mistakes that one can do?

@rleon , The mistake in this case will result in a wrong value sent to the EFA firmware and not a segfault / error or whatever in the application. Please see the last comment above.

I saw the comment and just gave more severe outcome.

The wrong values that were sent to FW can happen anytime, as I can use my own version of rdma-core (without any checks) and my own kernel. Your FW must be protected from such flows.

@rleon, The firmware supports these values and these flows but they are not sent correctly to the firmware.

Oct 06 '22 11:10 shuki-zanyovka

@rleon How we can move further with this?

Nov 08 '22 12:11 mrgolin

@mrgolin @shuki-zanyovka can you please summarize what you're trying to fix here? You started with a custom provider that bypasses qp limits check and it got more and more complicated over time..

Nov 08 '22 12:11 gal-pressman

@mrgolin @shuki-zanyovka can you please summarize what you're trying to fix here? You started with a custom provider that bypasses qp limits check and it got more and more complicated over time..

@galpress ,

On the receive queue: qp->rq.wq.desc_mask defined as a uint16. uint16_t efa_wq::desc_mask

On the EFA, we have: ctx->max_rq_wr 32768 ctx->max_rq_sge 3

==> The following condition will pass with no error if attr->cap.max_recv_wr == 32767 and max_recv_sge == 3 32767 * 3 = 98301 > MAX(uint16) ==> It will yield an overflow error.

if (attr->cap.max_recv_sge > ctx->max_rq_sge) {
	verbs_err(&ctx->ibvctx,
		  "Max receive SGE %u > %u\n", attr->cap.max_recv_sge,
		  ctx->max_rq_sge);
	return EINVAL;
}

if (attr->cap.max_recv_wr > ctx->max_rq_wr) {
	verbs_err(&ctx->ibvctx,
		  "Max receive WR %u > %u\n", attr->cap.max_recv_wr,
		  ctx->max_rq_wr);
	return EINVAL;
}

On the send queue, we have: On the send queue: We have uint32_t efa_wq::wqe_cnt and ctx->max_sq_wr 4096

Indeed, in this scenario there wouldn't be an issue for the values in the valid range.

Nov 09 '22 10:11 shuki-zanyovka

if (attr->cap.max_recv_wr > ctx->max_rq_wr) {
	verbs_err(&ctx->ibvctx,
		  "Max receive WR %u > %u\n", attr->cap.max_recv_wr,
		  ctx->max_rq_wr);
	return EINVAL;
}

For RQs max_rq_wr is misleading.. The driver reports the max RQ depth in units of descriptors, so the comparison of max_recv_wr and max_rq_wr is broken as they're in different units.

I believe the fix here should be changing efa_check_qp_limits() to test whether max_recv_wr * max_rq_sge is greater than max_rq_wr? I'm pretty sure I even had such patch laying around somewhere, maybe you can find it.

Nov 09 '22 15:11 gal-pressman

max_rq_wr

Does it mean we need to compare it to: roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr) ?

I.e.

max_recv_descs = roundup_pow_of_two(attr->cap.max_recv_sge * attr->cap.max_recv_wr); if (max_recv_descs > ctx->max_rq_wr) { verbs_err(&ctx->ibvctx, "Max receive WR %u > %u\n", attr->cap.max_recv_wr, ctx->max_rq_wr); return EINVAL; }

This might be a solution for this issue - but still, I think it makes sense to change the variables from uint16_t to uint32_t just to be on the safe side.

We might not need the code to fix the result of roundup_pow_of_two() if it overflows.

Nov 10 '22 17:11 shuki-zanyovka

max_rq_wr

Does it mean we need to compare it to: roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr) ?

I.e.

max_recv_descs = roundup_pow_of_two(attr->cap.max_recv_sge * attr->cap.max_recv_wr); if (max_recv_descs > ctx->max_rq_wr) { verbs_err(&ctx->ibvctx, "Max receive WR %u > %u\n", attr->cap.max_recv_wr, ctx->max_rq_wr); return EINVAL; }

Yes, something along these lines looks correct.

This might be a solution for this issue - but still, I think it makes sense to change the variables from uint16_t to uint32_t just to be on the safe side.

I would start by pushing what fixes the issue.

Nov 13 '22 13:11 gal-pressman

max_rq_wr

Does it mean we need to compare it to: roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr) ? I.e. max_recv_descs = roundup_pow_of_two(attr->cap.max_recv_sge * attr->cap.max_recv_wr); if (max_recv_descs > ctx->max_rq_wr) { verbs_err(&ctx->ibvctx, "Max receive WR %u > %u\n", attr->cap.max_recv_wr, ctx->max_rq_wr); return EINVAL; }

Yes, something along these lines looks correct.

This might be a solution for this issue - but still, I think it makes sense to change the variables from uint16_t to uint32_t just to be on the safe side.

I would start by pushing what fixes the issue.

@gal-pressman , We're looking at the solution you propose here, but still - even if we take your solution, can we also apply the following patch here to be on the safe side (to handle only the uint32_t issue)?

static void efa_setup_qp(struct efa_context *ctx,
			 struct efa_qp *qp,
			 struct ibv_qp_cap *cap,
			 size_t page_size)
{
	uint32_t rq_desc_cnt_u32;

	efa_qp_init_indices(qp);

	**qp->sq.wq.wqe_cnt = (uint32_t)roundup_pow_of_two(max_t(uint32_t, cap->max_send_wr, ctx->min_sq_wr));**
	qp->sq.wq.max_sge = cap->max_send_sge;
	qp->sq.wq.desc_mask = qp->sq.wq.wqe_cnt - 1;

	qp->rq.wq.max_sge = cap->max_recv_sge;
	**rq_desc_cnt_u32 = (uint32_t)roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr);**
	qp->rq.wq.desc_mask = rq_desc_cnt_u32 - 1;
	qp->rq.wq.wqe_cnt = rq_desc_cnt_u32 / qp->rq.wq.max_sge;

	qp->page_size = page_size;
}

Nov 14 '22 14:11 shuki-zanyovka

max_rq_wr

Does it mean we need to compare it to: roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr) ? I.e. max_recv_descs = roundup_pow_of_two(attr->cap.max_recv_sge * attr->cap.max_recv_wr); if (max_recv_descs > ctx->max_rq_wr) { verbs_err(&ctx->ibvctx, "Max receive WR %u > %u\n", attr->cap.max_recv_wr, ctx->max_rq_wr); return EINVAL; }

Yes, something along these lines looks correct.

This might be a solution for this issue - but still, I think it makes sense to change the variables from uint16_t to uint32_t just to be on the safe side.

I would start by pushing what fixes the issue.

@gal-pressman , We're looking at the solution you propose here, but still - even if we take your solution, can we also apply the following patch here to be on the safe side (to handle only the uint32_t issue)?
static void efa_setup_qp(struct efa_context *ctx,
			 struct efa_qp *qp,
			 struct ibv_qp_cap *cap,
			 size_t page_size)
{
	uint32_t rq_desc_cnt_u32;

	efa_qp_init_indices(qp);

	**qp->sq.wq.wqe_cnt = (uint32_t)roundup_pow_of_two(max_t(uint32_t, cap->max_send_wr, ctx->min_sq_wr));**
	qp->sq.wq.max_sge = cap->max_send_sge;
	qp->sq.wq.desc_mask = qp->sq.wq.wqe_cnt - 1;

	qp->rq.wq.max_sge = cap->max_recv_sge;
	**rq_desc_cnt_u32 = (uint32_t)roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr);**
	qp->rq.wq.desc_mask = rq_desc_cnt_u32 - 1;
	qp->rq.wq.wqe_cnt = rq_desc_cnt_u32 / qp->rq.wq.max_sge;

	qp->page_size = page_size;
}

Please explain the issue?

Nov 14 '22 14:11 gal-pressman

max_rq_wr

Does it mean we need to compare it to: roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr) ? I.e. max_recv_descs = roundup_pow_of_two(attr->cap.max_recv_sge * attr->cap.max_recv_wr); if (max_recv_descs > ctx->max_rq_wr) { verbs_err(&ctx->ibvctx, "Max receive WR %u > %u\n", attr->cap.max_recv_wr, ctx->max_rq_wr); return EINVAL; }

Yes, something along these lines looks correct.

This might be a solution for this issue - but still, I think it makes sense to change the variables from uint16_t to uint32_t just to be on the safe side.

I would start by pushing what fixes the issue.

@gal-pressman , We're looking at the solution you propose here, but still - even if we take your solution, can we also apply the following patch here to be on the safe side (to handle only the uint32_t issue)?
static void efa_setup_qp(struct efa_context *ctx,
			 struct efa_qp *qp,
			 struct ibv_qp_cap *cap,
			 size_t page_size)
{
	uint32_t rq_desc_cnt_u32;

	efa_qp_init_indices(qp);

	**qp->sq.wq.wqe_cnt = (uint32_t)roundup_pow_of_two(max_t(uint32_t, cap->max_send_wr, ctx->min_sq_wr));**
	qp->sq.wq.max_sge = cap->max_send_sge;
	qp->sq.wq.desc_mask = qp->sq.wq.wqe_cnt - 1;

	qp->rq.wq.max_sge = cap->max_recv_sge;
	**rq_desc_cnt_u32 = (uint32_t)roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr);**
	qp->rq.wq.desc_mask = rq_desc_cnt_u32 - 1;
	qp->rq.wq.wqe_cnt = rq_desc_cnt_u32 / qp->rq.wq.max_sge;

	qp->page_size = page_size;
}
Please explain the issue?

A few points:

The current code is:


static void efa_setup_qp(struct efa_context *ctx,
			 struct efa_qp *qp,
			 struct ibv_qp_cap *cap,
			 size_t page_size)
{
	uint16_t rq_desc_cnt;

	efa_qp_init_indices(qp);

	qp->sq.wq.wqe_cnt = roundup_pow_of_two(max_t(uint32_t, cap->max_send_wr,
						     ctx->min_sq_wr));
	qp->sq.wq.max_sge = cap->max_send_sge;
	qp->sq.wq.desc_mask = qp->sq.wq.wqe_cnt - 1;

	qp->rq.wq.max_sge = cap->max_recv_sge;
	rq_desc_cnt = roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr);
	qp->rq.wq.desc_mask = rq_desc_cnt - 1;
	qp->rq.wq.wqe_cnt = rq_desc_cnt / qp->rq.wq.max_sge;

	qp->page_size = page_size;
}

rq_desc_cnt is declared as a uint16: uint16_t rq_desc_cnt; Then, we assign a uint64 value to a uint16: rq_desc_cnt = **(uint64)**roundup_pow_of_two(cap->max_recv_sge * cap->max_recv_wr); At this point, it is possible that rq_desc_cnt is truncated and thus is assigned 0 ==> which goes all the way to the driver and causes a failure to initialize.
A minor but still a possible issue if we update the maximum value for cap->max_send_wr in our device: qp->sq.wq.wqe_cnt is declared as a uint32_t while roundup_pow_of_two() returns a uint64_t ==> With high values we can expect a truncation here. qp->sq.wq.wqe_cnt = roundup_pow_of_two(max_t(uint32_t, cap->max_send_wr, ctx->min_sq_wr));

Nov 14 '22 15:11 shuki-zanyovka

I don't see how your suggested code solves anything. There are some inconsistencies in the efa_wq area, but I believe the root cause for all is that the size of the doorbell itself is u16, not u32 (at least that what it looks like from efa_rq_ring_doorbell()/efa_sq_ring_doorbell()).

If the size of doorbell's producer counter is 16 bits, obviously there's no need to change desc_mask to u32.

Nov 14 '22 15:11 gal-pressman

rdma-core
rdma-core copied to clipboard

efa: Fix queue creation size uint32_t overflow

rdma-core rdma-core copied to clipboard

efa: Fix queue creation size uint32_t overflow

rdma-core
rdma-core copied to clipboard