gr-grnet
gr-grnet copied to clipboard
Sequence, size and CRC doesn't work correctly
When reviewing the code for udp sink, I thought it was strange seeing several send_to commands instead of formatting the buffer (header+payload) once and using a single send_to. When using a header not none, it looked like it would send 2 or 3 packets instead of one.
When I do a tcpdump on the PC running the UDP sink with sequence, size and crc, I can see a packet with length 8, then length 12, then a reassembled 32760 packet consisting of 23 fragments.
Was it intentional to send the packets higher than MTU? The UDP sink/source included with GRC has a maximum payload size of 1472 to specifically prevent fragmentation (which will add CPU overhead).
It looks like significant changes are required to get any of the header options working properly and if this is not a current project, I'd suggest changing the readme to say they are not functional in case someone else wants to take it up instead of testing and finding they are not working properly.
Looks like a container is needed so that multiple buffers can be sent using a single send_to:
example: // Write the serialized data to the socket. We use "gather-write" to send // both the header and the data in a single write operation. std::vectorboost::asio::const_buffer buffers; buffers.push_back(boost::asio::buffer(outbound_header_)); buffers.push_back(boost::asio::buffer(outbound_data_)); boost::asio::async_write(socket_, buffers, handler); }
I saw what you were seeing in a packet capture. I just made a number of improvements in there. See how they work for you. It now sticks to the same 1472 max packet size to avoid the fragmentation and does size/crc within/for each packet and everything looks correct in a packet capture.
Performance-wise I think there's a lot of code running on the CPU in the UDP loops that's not there in the TCP code since TCP and the boost::asio::write call can handle it versus needing to do send_to, extra buffer create/destroys, and the crc checks in the UDP version. My gut feel optimizing other code is that I'm not sure I can get the UDP code to run much faster and retain the functionality. If you see any other suggestions let me know.
Can you confirm if this is the expected/desired behaviour for the sequence numbers? With sequence only being sent, I capture on the receiver and the packets look like: ff ff ff ff fc 59 00 00 ff ff ff ff fd 59 00 00 ff ff ff ff fe 59 00 00 ff ff ff ff ff 59 00 00 ff ff ff ff 00 5A 00 00
I don't think that's what is desired. I think we want: ff ff ff ff 00 00 00 00 ff ff ff ff 00 00 00 01
Please confirm. When I look at TCP captures, the sequence increments like how I suggested. Also, same for size:
For a 1472 byte packet, I see: ff ff ff ff 3d 0e 00 00 b4 05 00 00
0x05 b4 = 1460 (+ 12 byte header = wireshark 1472 length)
For a 652 size packet, it was: ff ff ff ff 32 0e 00 00 80 02 00 00
For a 660 size packet, it was: ff ff ff ff 32 0e 00 00 88 02 00 00
0x0280 = 640 0x0288 = 648
Add the 12 bytes header, and that matches wireshark length.
I don't know, I just prefer seeing easy sequential numbers with my eyeballs without the additional swap. But this might be typical network behaviour. shrug Just looking to make sure its what you intended.
Lastly, it doesn't look like crc is working. When doing seq+size, last 4 bytes are non-zero. When doing crc, last 4 bytes are zero.
I just noticed that crc is now 8 bytes, with 4 bytes being zero's? What was the reason for going to 64 bits from 32? Can we just use 4 of the 8 so there is no waste?
Also, for testing, I'm using a simple Signal source connected to UDP sink on PC1 and UDP source to QT freq sink on PC2.
PC2 only displays the expected 1Khz signal source when header is set to none. Not sure if this means the UDP Source needs to be reworked. I haven't looked at it at all. It most likely does.
Everything you've described sounds as expected. For the size and sequence numbers, they're added to the buffer as a raw byte copy of the int values like so: memcpy((void *)&tmpHeaderBuff[4], (void *)&d_seq_num, sizeof(d_seq_num)); In doing so, the memory on the PC uses a little-endian memory storage of the number. So where you visually represent a number as 0x04030201 it'll get stored in memory as 0x01 0x02 0x03 0x04 and therefore memcpy'd to the buffer in the little-endian sequence. Seeing the first byte increment with each packet would be the representation of the lowest value incrementing which sounds correct. Using the straight memory copy into the packet makes adding on send and retrieval on the receiving side generally straightforward in that you can just map it back to an int's memory without having to take the overhead of byte and word swapping into and out of the memory location, saving a few CPU cycles of processing.
For the checksum, I stuck with a known crc algorithm so that folks developing their own apps could use a standard library. I use the crc function defined in zlib.h which returns an unsigned long. When I was debugging yesterday I noticed something interesting.... I would have expected an unsigned long to be 4 bytes, and an unsigned long long to be 8 bytes, but the code was always returning an 8-byte crc checksum as the unsigned long from the zlib crc function. I didn't want to lose any data or complicate splitting / recombining so I just corrected the packet to match the 8-byte crc value returned. That way if you receive it and perform the same crc check you can just check for equivalence between the 2 unsigned longs in a single operation (again for processing speed) without needing to do an extra bitwise filter to reduce it back to the first 32-bits. Again trading off CPU processing for a few extra network bytes. A lot of this thought process has been from a ton of optimizations trying to get to as much as 250 Msps which led to the other projects like gr-clenabled and gr-lfast (and I never got anywhere close with SDR, but did get some respectable performance boosts along the way).
For the PC1->UDP Sink --> PC2 UDP Source, what you've seen is correct. The UDP source block is unaware of sequence numbers and checksums so you should see the same results as the TCP pair when you use UDP with no headers/CRC. Again because processing those on receive in GNURadio could require extra queuing and calculations, it would impact overall GNURadio frame rates, so I left it out of the GNURadio source block (see the scenarios below).
My thoughts behind the blocks were more in line with this (again feel free to chime in, I"m all for continuing to improve things):
- If you need to go GNURadio on one system to GNURadio on another system (say to have signal processing on one system and GUI/instrumentation processing on another) use the TCP blocks. The sps rate going into the network blocks is your real bottleneck. On a gigabit ethernet connection, you'll probably get around 800-900 Mbps true throughput, which will translate to around 14 Msps full IQ (bits->bytes / 8 [2 x 32-bits/sample]). I like to tune my networks by measuring true throughput between 2 systems with iperf3 to check your setup (and I've noticed good cables and a half-way decent switch can make a big difference, like 700 Mbps versus 900 Mbps). You'd be better off reducing the data in some way before passing it to the network block like getting to a real data stream, or a resampled lower-rate data stream first. Or if 14 Msps over the network is too slow, consider going to a 10 Gbps switch if you really need more bandwidth (say passing 20 Msps or 60 Msps raw).
- If you want to protect frame rates in GNURadio and the SPS rate is okay on your network, you can always split your receiver signal processing from your decoding and use UDP to dump it to a decoding program running on another PC. That receiving decoder can then be true multi-threaded and take advantage of adding the 2nd processor in the overall radio system. This is where the UDP with sequence numbers and checksums can really come into play. The receiving system can then be designed to efficiently queue and resequence packets (or drop packets that come out of sequence) and check checksums without impacting GNURadio receiver performance. There's all kinds of tricks you can start to take advantage of here on a separate system like GPU processing, fused multiply-add and other code optimizations, or even Intel's IPP library that the standard GNURadio blocks don't do to process/decode the data faster.
Hope that helps. If you have any suggestions/thoughts keep them coming. I think this is great discussion over the best way to incorporate network comms into GNURadio.
Thanks for that detailed reply. For the use case I am working on, I need to add the boost endpoint stuff to use raw socket and I need to make the source/dst MAC for the header (before or instead of the ff,ff,ff,ff header, and probably a custom data type for easier debugging). I also need to take in a configurable packet size like the GRC UDP Sink does for default payload_size of 1472 (want to support up to 9000), but currently running into issues breaking flowgraph operation when trying to add in the new variables in udp_sink_impl.h (it looked pretty straight forward and compiled ok...). I'm not experienced with C++ and even my C is old and newbie-ish. When I grep for the new variables you added, they are only in two files, so I got some reading to do to figure out what I'm screwing up.
Anyways, the 1G interface will be used for management using the existing IP/port over UDP and a 10G interface will be connected to a DSP platform 10G NIC directly connected to an FPGA without an IP that won't be connected to CPU on the DSP based platform (hence the need for raw sockets that just needs the NIC interface to create endpoint) where the work will need to be sent instead of the 1G interface. I am told that most of the expensive packet processing can be done in the FPGA, so I won't stress about the endianess until we find out our performance bottlenecks and efficient algorithms. Sounds like this is what we want already if we want every last drop of performance.
I didn't quite understand the usage for header of ff, ff,ff, ff. How was that intended to be used and how would the source side know whether that was ignorable header or actual data that just happened to be all FF's? Why not use these bytes for making same as the UdpHeaderBytes of 0-3? Then the recipient would know how to parse the packet for payload vs headers. I get the impression from the boost buffers docs that on the receiver side, a mutable container can be used to map the received packet to the various header buffers, so basically like the sink but in reverse.
Ah, makes sense. One way to approach it would be to make up a new UDP or raw IP module and start with the code in the UDP class. If you want to go to 9000 byte jumbo frames, make sure your switch supports them and your linux systems are set up to support them (I can't remember but there may be a kernel setting to set to enable it). Then look for the number 1472 in the code in the switch statement for the work function and change it to 9000. That'll adjust the packet size.
For the leading 0xFF's, they're meant to be a sync sequence. You'll see something similar at the beginning of a physical ethernet frame if I remember correctly (6 I believe). If for any reason data streams get out of sync, a receiver can resync it's receive queue by dumping all bytes until the sync sequence is seen. Then you can key off of the size (and CRC if you use it). The receiver would know it's back in sync when the 0xFF sequence starts the packet and looking at size bytes produces a CRC that's matches. Just another way to verify reception integrity for raw data streams.
I have a coding issue that I don't understand that I thought I'll quickly run by you in case you ran into this and know exactly what the issue is.
I don't have a lot of coding experience but I can typically make small changes by looking at existing code and just extending what currently works. First off, making the payload size configurable should be super easy, just add another parameter for payload like the blocks udp_sink. In the grnet udp_sink_impl.h file, as soon as I add a size_t or int, GRC flowgraph fails with:
Traceback (most recent call last): File "/home/mike/top_block.py", line 106, in
main() File "/home/mike/top_block.py", line 95, in main tb.start() File "/usr/local/lib/python2.7/dist-packages/gnuradio/gr/top_block.py", line 109, in start top_block_start_unlocked(self._impl, max_noutput_items) File "/usr/local/lib/python2.7/dist-packages/gnuradio/gr/runtime_swig.py", line 5681, in top_block_start_unlocked return _runtime_swig.top_block_start_unlocked(r, max_noutput_items) RuntimeError: udp_sink(1): insufficient connected output ports (38102624 needed, 0 connected)
All I did was add my variable to udp_sink_impl.h under the existing ones: size_t d_itemsize; size_t d_veclen; size_t d_block_size; size_t d_my_var;
I don't get it. You add at least three new variables d_header_size, d_header_type and d_seq_num and I ONLY see these show up in udp_sink_impl.h and udp_sink_impl.cc. Based on the connected output ports being a random crazy number, it's like the io_signature::make is getting borked somehow.
Also, strangely, I can add "unsigned char ethHeaderBuff[12]" right below the "char tmpHeaderBuff[12];" and that doesn't upset GRC flowgraph at all and doesn't error when trying to add an int or size_t.
I'm beginning to wonder if this is a gnuradio bug, but seems way too easy to trigger if that's the case. I'm leaning towards me overlooking something simple in C++. Thanks in advance.
Hmm, I suspect my setup might be bad. I was sometimes getting a python invalid pointer issue (that required a reboot to get past), and got fixed with: sudo apt-get install libtcmalloc-minimal4 export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4"
So now, when I add a variable to udp_sink_impl.h and play the flowgraph, it runs about 1/5 times. I just keep pressing Play until it runs without error. Sometimes it plays on the 2nd try, others the 6th attempt.
If you can add a simple one line variable to udp_sink_impl.h (it doesn't have to get used) and run flowgraph without immediate error, then it will confirm my setup is bad and I need to make a new one. I would much prefer for my setup to be corrupt than my basic understanding being wrong.
Using xubuntu 16.04.3, 4GB RAM in a VM. Thanks
I may have some time later in the week to see about adding the payload as a parameter. There's actually 3 places you have to touch when adding constructor parameters.... 1. The XML to pass it, 2. in the base /include/grnet directory there's a udp_sink.h. The constructor there needs to match. Then 3.) the _impl files.
Once you do that you should delete the build directory and recreate it or the Swig python wrappers may still have the old parameter list when it generates the wrappers. I'll let you know if I have some time to add the payload size as a parameter but in the meantime I hope that helps.
Thanks, I've done those three changes. I don't know what changed, but I did get the payload size in and working. However, when I try and do the same for eol or mac address and network interface, I get problems related to the number of connected output ports (random crazy number instead of 0). I expected to be able to just append ",d_eof(eof)", but in udp_sink_impl.h, if I uncomment "bool d_eof;", the flowgraph crashes when ran. No compile errors, just problems when executed in GRC. A colleague did suggest that the udp_sink.h doesn't know about something it needs to know about.
Should there be a "virtual bool stop();" in udp_sink.h? I would have thought the compiler would complain about this if it was needed. gr-blocks version includes several virtuals https://gnuradio.org/cgit/gnuradio.git/tree/gr-blocks/include/gnuradio/blocks/udp_sink.h.
My modification: udp_sink::make(size_t itemsize,size_t vecLen,const std::string &host, int port,int headerType, int payload_size, bool eof, const std::string &send_iface, const std::string &destMac)
I don't quite understand how d_itemsize is being set to itemsize: d_itemsize(itemsize), d_veclen(vecLen),d_header_type(headerType),d_seq_num(0),d_header_size(0),d_payload_size(payload_size)
in the _impl.cc file. The gr-blocks udp_sink does this too, so I just assumed this was ok style and was able to append payload_size and work. The tutorial makes it sound like there should be set/get functions for each variable and some virtual's and additional "static sptr make(dtype var);" stuff in udp_sink.h. Today, I am going to try changing them to set_var and var()'s like in the tutorial https://wiki.gnuradio.org/index.php/BlocksCodingGuide. like: { set_itemsize(itemsize); set_veclen(veclen); set_block_size(block_size); set_header_type(header_type); set_header_size(header_size); set_payload_size(payload_size); set_eof(eof); set_connected(connected); set_send_iface(send_iface); set_destMac(destMac); }
Deleting the build dir is a good tip that I wish gnuradio wiki pointed out. Last week, I did lose several hours chasing problems caused by old swig and being confused because the compile errors made no sense. So now its standard operating procedure. I'll be sure to add that to any wiki or instructions for my colleagues.
In C++ when you see something like this in the constructor d_eof(eof) what it's saying is initialize my class attribute d_eof with the variable eof passed to the contstuctor. Did you also add class variables in the header for d_eof and the others? That may be the missing piece.
For the stop() function, no you shouldn't need to do it. It's defined in the base class and just overridden here in these classes because of the way python and swig do/don't handle destructor code correctly going from c++ to python.
Can you take a look at what I'm missing in my fork? Here are the changes: https://github.com/ghostop14/gr-grnet/compare/master...BCITMike:master
I'm changing 4 files. modified: grc/grnet_udp_sink.xml modified: include/grnet/udp_sink.h modified: lib/udp_sink_impl.cc modified: lib/udp_sink_impl.h
When flow graph is executed, it gives:
Traceback (most recent call last):
File "/home/mike/top_block.py", line 110, in
I should point out doing this same step worked in my local (much more changed, with TCP stuff ripped out) clone of your project, but not at first. I tried doing payload, eof, mac and interface in one go and didn't work. So I reverted and removed stuff and eventually it worked without error. Then tried adding back the eof doing same, and result is the insufficient connected output ports... I don't know what changed to make payload_size suddenly work, which is quite frustrating.
So if you're able to tell me what I'm doing wrong, I think I'll be able to carry out a good bit more development before getting stuck again. Thanks, much appreciated.
Strangely enough, I started off doing all sink stuff because I thought it was easier to work with than source. So I changed to making similar changes to the udp source, and so far the issue hasn't reoccurred. So I think I'll proceed on the source side until I can figure out where the sink is busted.
On the source side, I was able to add payload, eof, interface and MAC address without issue to my much changed local repo.
When I went to add IP address, doing the exact same thing I did for the previous two string variables, and suddenly python crashes with double-linked list. It stopped crashing when I commented d_host out in udp_source_impl.h. If I try and add a new string, like "d_somerandom", the double-linked list comes back.
sigh
Added the payload size parameter to the UDP sink class and added an example GRC for it.
I'd probably need to see your constructor to get a better idea of what may be causing the crash. Can you post that?
The changes I made specifically can be seen here: https://github.com/ghostop14/gr-grnet/compare/master...BCITMike:master
The forked repo where you can see all the source is here: https://github.com/BCITMike/gr-grnet
I got a reply from someone on the gnuradio list that they've seen similar issues due to a missing liblog4cpp5-dev, so I'm trying that now. I'll also review and test your changes to see what is different.
Thanks!
Ok, good news. I'm not going insane. I emailed the gnuradio mailing list and someone replied that they also saw instability after using the gnuradio build script. After uninstalling gnuradio, installing liblog4cpp5-dev and reinstalling gnuradio, I am able to run the flow graphs that was mysteriously failing.
Thanks!
That's awesome. I looked at your code and it looked fine. I may have a few minutes on Monday. I looked at the EOF and that looks like an easy addition too, so let me get that in the main code for you. I think I'll optimize the maxdatasize calc too and just do it once in the constructor rather than every work() call. Just more efficient. I'll post back here after I make the changes.
I had a few minutes this morning so I incorporated the EOF code into the UDP sink module too and optimized the size calcs to just happen once in the constructor. Let me know if those changes do the trick for you.
I ended up doing a lot more sink mods for our use case and now I'm just getting back to the source side of things. The source side only ever worked without header and needs modifications to support the other header configurations. Currently, the entire payload (size, sequence, crc) gets put into the buffer, breaking the actual data. Something needs to be done to either advance to the read_buffer beyond the added headers or copy just the data and not the headers. I wonder if a buffer can be made the size of the headers and a buffer the size of the payload, and the packet gets copied to the mutable_buffers_type that includes those two buffers so that the 2nd buffer will only contain payload to copy to output buffer. Probably needs a switch to set the size of the first buffer?
Maybe just need to offset the buffer by the header size?
const char readData = boost::asio::buffer_cast<const char>( read_buffer.data() + d_header_size ); int blocksRead = (bytesRead-d_header_size ) / d_block_size; //subtract header overhead from bytes read int remainder = (bytesRead-d_header_size ) % d_block_size;
for (i=0;i<bytesRead-d_header_size;i++) { localQueue.push(readData[i]); } Question: unsigned int qnoi = blocksRead * d_block_size; for (i=0;i<qnoi;i++) { out[i]=readData[i]; }
How come readData is incremented by 1, instead of by [i * d_block_size] or itemsize? Doesn't this just copy a single byte instead of itemsize?
Thanks
Edit: It looks like offsetting the readData pointer by the header size works. For the CRC case, something else will need to be done. I wonder if it would be best to move the CRC to before the payload instead of after. You can still calc the crc32, just push to back the payload after the tmpHeader that includes the CRC calc.
Still have some other issues to resolve, like the odd wrong byte and wrong file size (using file sink and source as the data verification in my testing). Debugging that next.