contiki-ng
contiki-ng copied to clipboard
Fix native-border-router with TSCH
Problem
I tried to use examples/rpl-border-router
and examples/slip-radio
on Cooja with TSCH, but it could not form a TSCH network.
I also checked radio message logs, and confirmed that they didn't send Enhanced Beacons.
Why does it happen?
This problem happens because slip-radio
doesn't have header generation function (i.e. the framer) of IEEE 802.15.4. The framer is executed by native rpl-border-router
, not by MAC layer of slip-radio.
When the border router sends an IPv6 packet, there is no problem. The native rpl-border-router creates a complete frame, passes it to slip-radio and slip-radio transmits it. However, when slip-radio tries to send an Enhanced Beacon, it cannot make the frame it because it doesn't have the framer.
Changes
In this p-r, followings are changed.
- The framer of IEEE 802.15.4 is moved from native
rpl-border-router
toslip-radio
. As a result, MAC payloads without MAC header are now exchanged over the SLIP link. - Now packetbuf attributes (including the sender and receiver addresses) are serialized and exchanged over the SLIP link. This is necessary because the MAC header is not exchanged anymore.
Is the issue itself that, a cooja mote running examples/slip-radio
doesn't transmit EBs when the firmware is built with TSCH (MAKE_MAC = MAKE_MAC_TSCH
in Makefile
)?
Thank you for comments.
When I checking, a mote had transmitted EBs as pointed out.
However, transmitted EBs does not have IEEE 802.15.4 header and is imperfect. So, a receiver mote cannot parse EB correctly, and the TSCH network has not formed forever I think that this is the essence of the problem in this p-r.
Well first, big props for the very well documented code! For this PR we need to consider from a design perspective if we want indeed to switch to header-less frames over SLIP, or instead add the framer in serial radio only for EBs (or other traffic originating from the serial radio).
@joakimeriksson @nfi, thoughts?
@tsakoda Thanks for this, after including your changes, rpl border router on native with slip radio running on cc1310 seems to work well with MAKE_MAC = MAKE_MAC_TSCH and the default minimal schedule.
@tsakoda Something is wrong when sending large packets that requires fragmentation from outside the border router to a node, the packet never arrives and sometimes breaks the tun interface with [ERR : P-utils ] *** unknown attribute 32 and border-router.native-ng: slip-dev: illegal packetbuf_attrs
wrote 90 bytes to ('fda7:eb5e:d7ef:0:212:4b00:197b:4b0a', 3000, 0, 0) [INFO: TCP/IP ] input: received 138 bytes [INFO: IPv6 ] packet received from fda7:eb5e:d7ef::1 to fda7:eb5e:d7ef::212:4b00:197b:4b0a [INFO: IPv6 ] Forwarding packet to next hop fda7:eb5e:d7ef::212:4b00:197b:4b0a [INFO: IPv6 ] Sending packet with length 138 (98) [INFO: RPL ] SRH creating source routing header with destination fda7:eb5e:d7ef::212:4b00:197b:4b0a [INFO: RPL ] SRH path len: 0, ComprI 15, ComprE 15, ext len 8 (padding 0) [INFO: TCP/IP ] output: processing 146 bytes packet from fda7:eb5e:d7ef::1 to fda7:eb5e:d7ef::212:4b00:197b:4b0a [INFO: TCP/IP ] output: selected next hop from SRH: fe80::212:4b00:197b:4b0a [INFO: TCP/IP ] output: sending to 0012.4b00.197b.4b0a [INFO: 6LoWPAN ] output: sending IPv6 packet with len 146 [INFO: 6LoWPAN ] output: header len 56 -> 53, total len 146 -> 143, MAC max payload 106, frag_needed 1 [INFO: 6LoWPAN ] output: fragmentation needed, fragments: 2, free queuebufs: 15 [INFO: 6LoWPAN ] output: fragment 1/2 (tag 1, payload 48) [INFO: BR-MAC ] sending packet (105 bytes) [INFO: P-utils ] packetutils: serializing packet atts [INFO: P-utils ] serialized 2 packet atts [INFO: P-utils ] serialized 2 packet addrs Packet from TUN of length 134 - write SLIP [INFO: 6LoWPAN ] output: fragment 2/2 (tag 1, payload 42, offset 104) [INFO: BR-MAC ] sending packet (47 bytes) [INFO: P-utils ] packetutils: serializing packet atts [INFO: P-utils ] serialized 2 packet atts [INFO: P-utils ] serialized 2 packet addrs Packet from TUN of length 76 - write SLIP [DBG : slip-radio] SR-SIN: 134 '!S' [DBG : slip-radio] sending 7 (105 bytes) [INFO: Frame 15.4] Out: 1 0012.4b00.197b.4b0a 21 105 (126) [INFO: TSCH ] send packet to 0012.4b00.197b.4b0a with seqno 56, queue 3 3, len 21 126 [DBG : slip-radio] SR-SIN: 76 '!S' [DBG : slip-radio] sending 8 (47 bytes) [INFO: Frame 15.4] Out: 1 0012.4b00.197b.4b0a 21 47 (68) [INFO: TSCH ] send packet to 0012.4b00.197b.4b0a with seqno 57, queue 4 4, len 21 68 [INFO: TSCH ] packet sent to 0012.4b00.197b.4b0a, seqno 54, status 4, tx 8 [DBG : slip-radio] packet sent! sid: 5, status: 4, tx: 8 [INFO: RPL ] packet sent to 0012.4b00.197b.4b0a, status 4, tx 8, new link metric 146
@Johan-Henry Thank you for debugging. Although I also did the test which transmits the packet of large size using Cooja, the error in particular did not occur.
Did the above-mentioned error reporting occur when used the real devices? In addition to this, please let me know in detail about the hardware and environmental in error occurring.
@tsakoda Everything runs on physical hardware. I have a cc1310 slip radio running with MAKE_MAC = MAKE_MAC_TSCH and a native border router running on a Raspberry pi 3. Both the slip radio and the native border router runs code with your changes. I have 4 different cc1310 sensing devices running with the current Contiki-ng develop branch also MAKE_MAC = MAKE_MAC_TSCH. I send UDP packets from the raspberry pi to the sensing devices. Everything works well, but large packets are dropped as explained previously. As soon as I switch back to using a rpl-border-router where the complete stack runs on the cc1310, the same scenario works perfectly.
@Johan-Henry Although I also tried using the real device, the same situation did not occur. Please check following two points.
- Do you reproduce the same environment on Cooja and run perfectly?
- Does the SLIP code which you are using operate satisfactorily?
I couldn't reproduce the error. It's difficult for me to debug the problem. I would be happy if you could help debugging.
Sorry for joining the discussion late. From my perspective I would say that the combination of TSCH that require solid timing and the slip serial link between Linux and the slip-radio device this will not work very well. The timing between linux and the slip-radio device will have too much variation to ensure good timing from trying to send and actually sending. We will need to come up with something that separates the TSCH transmission/reception schedule from the transmission of packets over the serial link. Either send next X slots over serial all the time. Or now and then configure the schedule on the slip-radio so that it is always up-to-date and that the slip radio will report its current global "clock" so that we at all times in the NBR can figure out which slot we are at - timing wise. Then when the schedule is there - we can just send over "send packet in slot X" over slip. Typically transferring the data over the serial link at least 10s of milliseconds befor the packet needs to be sent. Otherwise we might not get the packet over in time.
At Yanzi we have a similar feature - but it is just for send packet at time X (basically schedule a delayed transmission). For TSCH we however probably need to do both schedule and delayed transmissions as the schedule also defines when to listen at what channel.
But the issue here I believe is about where to run the framer, nor about timing. I don't think there is any timing issue really btw, as the MAC runs entirely in the SR. The packet is sent over serial, then queued and send by the SR as per the TSCH schedule.
Assuming that we always send packets early it should work fine if we have all of TSCH down in the SR. How much buffers do we need in SR? Running framer in SR seems ok - if needed we might even be able to run the whole 6lowpan part of the stack in SR?
Could the fragmentation issue above be due to some configuration of max packet size? (so that the packet gets too large for some reason?) Or is it possibly just SLIP / serial byte errors? I guess adding debug in many places in SR - specifically any place where packets are dropped due to size would be good for getting an idea about the issue.
I think we should run framer on slip-radio. That is because other part of MAC functions, for example, the TX queues and the controller of TX timing, are already included in slip-radio. It looks strange that only the framer is excluded from slip-radio.
Hi @joakimeriksson. I'm working with @tsakoda on this issue.
TSCH layer in SR needs as much buffer as TSCH layer in a normal node needs. However, SR needs addtional buffer space to store frames it gets from the native part over SLIP.
We could run 6lowpan in SR, but I wouldn't do that. That is because 6lowpan requires relatively large buffer space for fragmentation.
I'm not sure about the error that Johan reported above. It's a good idea to enable (or insert) as much debug message as possible to see what's going on.
Hi @debug-ito, unfortunately I have not had time to look at the issue I described above again, but as soon as I do I will investigate further.
@debug-ito sure - I understand that issue - and I guess it should not be needed either - re-assembly buffers will be a bit costly to have, and you will also need a full IPv6 packet buffer which is probably not used for anything else. Removing frames for 802.15.4 (in NBR) should probably be the best way to go. Quick question: do we have a way to discover if we have headers or not? So that we do not mess things up when using old/new serial radios with new NBR?
do we have a way to discover if we have headers or not?
Currently no. The user must ensure that they are using the right combination of NBR and SR.
Ok, I guess that is fine for now. We will likely work on improving the NBR <-> SR protocol soon - we can add it then!
I got time to take a closer look to this PR two weeks ago; and decided to propose another approach, which is PR #1021.
I think, it would be better to avoid splitting the border router into two parts using a serial connection for a dynamic TSCH scheduling, which may have interaction between the scheduling function and IPv6/RPL. For instance, the scheduling function triggers a cell allocation on a neighbor cache operation or on a RPL parent change. This limitation of the current approach is already mentioned in Wiki, https://github.com/contiki-ng/contiki-ng/wiki/Tutorial:-RPL-border-router#native-border-router.
Looking forward to hearing from you all.
Hi @debug-ito I am looking into this pull request again, but still struggling with getting it to work on cc1310 boards when sending large packets that require fragmentation as discussed above. I am getting the following error on client nodes that receive the large fragmented packets:
[WARN: 6LoWPAN ] reassembly: failed to store N-fragment - could not find session - tag: 38 offset: 16 [ERR : 6LoWPAN ] input: reassembly context not found (tag 38)
Any idea why? I am not too familiar with the contiki stack, so I am finding it a bit difficult to trace the problem
Thanks for the follow-up report, @Johan-Henry.
The "client nodes" you mentioned are CC1310s that communicate with the border router by 6TiSCH, right? That is,
[native-border-router on RPi]----SLIP----[slip-radio on CC1310] ....6TiSCH .... [client on CC1310]
(1)---> (2)---->
<---(4) <----(3)
Is the above configuration correct?
If that is correct, your report looks different from the one you reported in May. In May, I thought that the error was detected at native-border-router for a packet flow of (4), but now the error is reported at client for a packet flow of (2). Is it correct?
Anyway, if the "client" complains on a packet in (2), I suspect it's already corrupted in (1). Make sure slip-radio receives every byte of every frames correctly from native-border-router. Sometimes the baud rate of SLIP line is so fast for slip-radio that its UART buffer overflows.
Thanks for your reply @debug-ito. Apologies for the confusion, yes, the situation described above is correct where the client complains on the packet in (2) only if fragmentation is required. I am pretty sure that the fragmented packets are corrupted in (1) or the metadata required for successful reassembly of a fragmented packet is lost. This error only occurs with the implementation of this pull request, if I replace the [native-border-router on RPi]----SLIP----[slip-radio on CC1310] combination with [tunslip6] ---SLIP---[embedded-border-router on cc1310] so that the complete border router is run on die embedded device, the same issue does not occur and everything works 100%. I am looking into it some more and will report if I find something else. All cc1310 devices run 6TiSCH with UDP packets being sent.
I finally solved the issue, in os/services/rpl-border-router/native/border-router-mac.c line183, changing 127 to 125 fixes it for the cc1310. There was a mismatch in max packet length sent by the native border router to the slip radio, causing the slip radio to drop large packets. Maybe we can make the max payload configurable with a #define?
Great!
Maybe we can make the max payload configurable with a #define?
Yes, we can do that right away. @tsakoda, could you add the fix to this branch?
However, it's inconvenient for users that they need to adjust the max payload size of border-router-mac, based on the implementation of MAC and PHY in slip-radio. I can think of two ways to solve this problem.
- Define some kind of negotiation protocol between the native border router and slip-radio to exchange the max payload size.
- Use the standalone native border router (#1021)
I prefer 2.
@debug-ito no problem. I will respond immediately.
Hi!
This was my mistake, very sorry about it, re-opening this PR now.
What happened is the following: I was switching the base branch for this repository https://github.com/wittra/contiki-ng from develop to wittra. But accidentally, I made the change on the wrong repo (this repo) and instead of switching the base branch I renamed it. And somehow github deleted develop and closed all PRs...
Many apologies for this mishap 🙏; I haven't contributed in a while.. but now at least everybody got some notification from me :p