uhd icon indicating copy to clipboard operation
uhd copied to clipboard

sr_write does not propperly prevent from overwriting send-ethernet-buffer when writing too fast

Open Kuurusch opened this issue 6 years ago • 17 comments

Issue Description

We have encountered, that when we write too many values too fast one after each other with sr_write() on a device3 (RFNoC), we get en error where it says, some packet-number is before an other. But this comes, because we overwrite a full buffer. When we write slower (with more time between each sr_write()-call, it works without problem.

But I would expect, that this is handled by the API and it should check, before writing to the ethernet-send-buffer, if it is not full!!

Kuurusch avatar Mar 04 '19 08:03 Kuurusch

@Kuurusch On which device?

mbr0wn avatar Mar 04 '19 17:03 mbr0wn

Also, is this on a custom RFNoC block? If so, did you set CMD_FIFO_SIZE to a custom value?

mbr0wn avatar Mar 04 '19 19:03 mbr0wn

If it is a custom RFNoC Block, try setting CMD_FIFO_SIZE to 8 (see noc_block_radio_core.v).

mbr0wn avatar Mar 05 '19 01:03 mbr0wn

(If that works, it's still a bug on our side, but it would narrow down the source).

mbr0wn avatar Mar 05 '19 01:03 mbr0wn

@Kuurusch On which device?

on a x310 with ubx-160 daughterboards.

Kuurusch avatar Mar 12 '19 08:03 Kuurusch

If it is a custom RFNoC Block, try setting CMD_FIFO_SIZE to 8 (see noc_block_radio_core.v).

Yes, it is a custome RFNoC-block. We will try, thank you very much!

Kuurusch avatar Mar 12 '19 08:03 Kuurusch

The error we get is the following: terminate called after throwing an instance of 'uhd::io_error' what(): EnvironmentError: IOError: [0/AGCCalibration_1] sr_write() failed: EnvironmentError: IOError: Block ctrl (CE_10_Port_D0) packet parse error - EnvironmentError: IOError: Expected packet index: 1129 Received index: 1147

Kuurusch avatar Mar 12 '19 08:03 Kuurusch

So we've resized the fifo, but no difference!

Kuurusch avatar Mar 12 '19 15:03 Kuurusch

@Kuurusch Huh, I was not expecting that. We have a fix for a (maybe? hopefully?) related but in the pipeline, give us a few days to push it out and then we'll try again.

mbr0wn avatar Mar 14 '19 17:03 mbr0wn

@mbr0wn Thank you very much!

Kuurusch avatar Mar 14 '19 17:03 Kuurusch

@Kuurusch You can apply this in the meantime and see if it helps:

From e9dc02d796e8a3206d3847c7d24fc39e78dff472 Mon Sep 17 00:00:00 2001
From: Sugandha Gupta <[email protected]>
Date: Tue, 12 Mar 2019 10:27:23 -0700
Subject: [PATCH] rfnoc: noc_shell: Change default address for the reg
 readbacks

The rb_stb signal is used in the user logic (e.g. simple_spi_core
in the radio) as a means to back pressure and prevent spi setting
register writes during ongoing spi transactions. While in the
noc_shell register it is used more as a "valid" or "done" signal
for register reads. Without this fix, the default rb_addr is the
last used rb address and can affect the behavior of rb_stb/back
pressure in the spi core, which eventually results in spi reg
writes during a spi transaction. Changing the default to RB_USER,
allows the user to control the behavior of rb_stb.
---
 usrp3/lib/rfnoc/cmd_pkt_proc.v | 6 +++++-
 usrp3/lib/rfnoc/noc_shell.v    | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/usrp3/lib/rfnoc/cmd_pkt_proc.v b/usrp3/lib/rfnoc/cmd_pkt_proc.v
index 1c1f422db..8506a89bf 100644
--- a/usrp3/lib/rfnoc/cmd_pkt_proc.v
+++ b/usrp3/lib/rfnoc/cmd_pkt_proc.v
@@ -52,6 +52,7 @@ module cmd_pkt_proc #(
   // TODO: Eliminate extra readback address output once NoC Shell / user register readback address spaces are merged
   parameter SR_RB_ADDR = 0,         // Settings bus address to set NoC Shell readback address register
   parameter SR_RB_ADDR_USER = 1,    // Settings bus address to set user readback address register
+  parameter DEFAULT_RB_ADDR = 0,    // Default address for the readback address bus
   parameter FIFO_SIZE = 5           // Depth of command FIFO
 )(
   input clk, input reset, input clear,
@@ -150,7 +151,7 @@ module cmd_pkt_proc #(
       resp_tvalid         <= 1'b0;
       set_stb             <= 1'b0;
       set_has_time        <= 1'b0;
-      rb_addr             <= 'd0;
+      rb_addr             <= DEFAULT_RB_ADDR;
       rb_addr_user        <= 'd0;
     end else begin
       case (state)
@@ -212,6 +213,9 @@ module cmd_pkt_proc #(
               rb_addr       <= int_tdata[RB_AWIDTH-1:0];
             end else if (set_rb_addr_user) begin
               rb_addr_user  <= int_tdata[RB_USER_AWIDTH-1:0];
+              rb_addr       <= DEFAULT_RB_ADDR;
+            end else begin
+              rb_addr       <= DEFAULT_RB_ADDR;
             end
             set_stb         <= 1'b1;
             if (int_tlast) begin
diff --git a/usrp3/lib/rfnoc/noc_shell.v b/usrp3/lib/rfnoc/noc_shell.v
index b66688aed..aef26a3b2 100644
--- a/usrp3/lib/rfnoc/noc_shell.v
+++ b/usrp3/lib/rfnoc/noc_shell.v
@@ -249,6 +249,7 @@ module noc_shell
          .USE_TIME(USE_TIMED_CMDS),
          .SR_RB_ADDR(SR_RB_ADDR),
          .SR_RB_ADDR_USER(SR_RB_ADDR_USER),
+         .DEFAULT_RB_ADDR(RB_USER_RB_DATA),
          .FIFO_SIZE(CMD_FIFO_SIZE[8*k+7:8*k]))
        cmd_pkt_proc (
          .clk(clk), .reset(reset), .clear(1'b0),
-- 
2.20.1

mbr0wn avatar Mar 20 '19 21:03 mbr0wn

@mbr0wn Hi, we hadn't time to test this out but we just moved from UHD 3.14.0.0-rc1 to the final release 3.14.0.0. But we saw, that nothing has changed in the FPGA-files. Where have you introduced your changes?

Kuurusch avatar Apr 23 '19 15:04 Kuurusch

Good question; they haven't been merged yet. I will ping the team and see what's up with that changeset.

mbr0wn avatar Apr 23 '19 16:04 mbr0wn

I'm having a similar issue: IOError: Block ctrl (CE_02_Port_30) packet parse error - EnvironmentError: IOError: Expected packet index: 48 Received index: 49

I changed CMD_FIFO_SIZE to 8 and applied the patch above but it did not have any effect.

I noticed the radio block also sets RESP_FIFO_SIZE to 5. This buffers response packets coming out of the command packet processor. Could this help? I'm currently building this now..

rigoorozco avatar Jun 14 '19 21:06 rigoorozco

hi current I'm having a similar issue problem in X310 in UHD-3.14.1.1-release

when I do this

usrp_->set_rx_rate(sampRate);

the sampeRate is 50e6, it's very easy to throw this exception when I recevie huge data. uhd::io_error EnvironmentError: IOError: [0/Radio_0] sr_write() failed: EnvironmentError: IOError: Block ctrl (CE_01_Port_40) packet parse error - EnvironmentError: IOError: Expected packet index: 2360 Received index: 2361 I just use the function RecevieNSymbol some times, then UHD will throw exception. the recvSymbol is about 10e6.

For testing this, I download gnuradio and use `/usr/local/bin/uhd_fft -f800e6 -s50e6' , it also will throw this exception when it's running for about 1minutes. When I change the sample rate to 20e6, it seems work well.

void UsrpObj::ReceiveNSymbol(ComplexType *hostBuff, size_t nSymbol)
{
	uhd::stream_cmd_t stream_cmd(uhd::stream_cmd_t::STREAM_MODE_START_CONTINUOUS);
	stream_cmd.num_samps = nSymbol;
	stream_cmd.stream_now = true;
	stream_cmd.time_spec = uhd::time_spec_t();
	//stream_cmd.time_spec = usrp_->get_time_now();

	uhd::stream_args_t stream_args("fc32", "sc16");
	std::vector<size_t> channel_nums;
	channel_nums.push_back(boost::lexical_cast<size_t>(0));
	stream_args.channels = channel_nums;
	uhd::rx_streamer::sptr rx_stream = usrp_->get_rx_stream(stream_args);

	rx_stream->issue_stream_cmd(stream_cmd);

	uhd::rx_metadata_t md;

	size_t samplePerBuff = 0;
	size_t toWrite = 0;
	size_t nSampsLeft = nSymbol;
	const int kSAMPLE_PER_BUFF = rx_stream->get_max_num_samps();
	cout << boost::format("sample_per_buff = %d\n") % kSAMPLE_PER_BUFF;
	size_t numRxSamps;
	const auto start = std::chrono::steady_clock::now();
	while (nSampsLeft != 0)
	{
		if (nSampsLeft >= kSAMPLE_PER_BUFF)
			samplePerBuff = kSAMPLE_PER_BUFF;
		else
			samplePerBuff = nSampsLeft;
		// std::cout << nSampsLeft << std::endl;
		try {
			numRxSamps = rx_stream->recv(hostBuff + toWrite, samplePerBuff, md, 3.0, false);
		}
		catch (uhd::exception &e) {
			cout << boost::format("uhd::io_error %s") % e.what() << endl;
			continue;
		}
		if (md.error_code == uhd::rx_metadata_t::ERROR_CODE_TIMEOUT)
		{
			//printf("Timeout while streaming......\n");
			break;
		}
		if (md.error_code == uhd::rx_metadata_t::ERROR_CODE_OVERFLOW)
		{
			//printf("Overflowing while stream......\n");
			continue;
		}
		if (md.error_code != uhd::rx_metadata_t::ERROR_CODE_NONE)
		{
			//printf("Receive error: %s \n", md.strerror());
			continue;
		}
		
		//std::copy(buff.begin(), buff.end(), hostBuff + toWrite);
		toWrite += numRxSamps;
		nSampsLeft -= numRxSamps;
	}
	const auto end = std::chrono::steady_clock::now();
	const auto time_diff = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
	cout << boost::format("recv cost %d milliseconds") % time_diff.count() << endl;

	stream_cmd.stream_mode = uhd::stream_cmd_t::STREAM_MODE_STOP_CONTINUOUS;  
	stream_cmd.stream_now = false;
	rx_stream->issue_stream_cmd(stream_cmd);  //to end continuous
}

patientCat avatar Dec 26 '19 09:12 patientCat

Does someone tested this problem? And the fix? We hadent time so fare but run into the problem again at an other part of our CPP software. If not necessary, I would like to prevent to powering up all the environment to develop on the FPGA again to figure out, that the fix didn't work...

Kuurusch avatar Jun 18 '21 07:06 Kuurusch

We are no longer fixing issues on 3.15, so I'm closing this. As a workaround, you can try splitting command and data traffic, and reducing the poke speed.

mbr0wn avatar Mar 19 '24 17:03 mbr0wn