contiki-ng
contiki-ng copied to clipboard
6TiSCH Minimal Scheduling Function (MSF)
This PR implements MSF on top of Issue #987. All the features except for "negotiated Rx cell" are supported. Here are diffs between the two branches: https://github.com/yatch/contiki-ng/compare/pr/sixtop-update...yatch:pr/msf
This PR also have changes to put a "priority" packet to the top of TX queue.
tests/16-6tisch-msf/01-cooja-test-msf-base.csc
and tests/16-6tisch-msf/02-cooja-test-msf-relocate.csc
would demonstrate how it works.
This is under testing and evaluation.
Thanks for great work! I reviewed the code (https://github.com/yatch/contiki-ng/tree/9a285a08850c8f4a64542f5d9784b6630bf64169). Here are my comments.
About PACKETBUF_ATTR_PRIORITY
PACKETBUF_ATTR_PRIORITY
attribute is added. 6P frames set this flag, and tsch-queue treats them prioritized. This behavior is enabled whether or not msf is used.
Is there any specification that prioritizes 6P frames over other frames? I think I have seen it somewhere, but cannot remember it...
msf-negotiated-cell: two new data models
msf-negotiated-cell introduces two data models:
-
(list of) negotiated tx cells
-
It's a linked list alternating between
tsch_link
andmsf_negotiated_tx_cell_data_t
.-
tsch_link.data
points tomsf_negotiated_tx_cell_data_t
. -
msf_negotiated_tx_cell_data_t.next
points to the nexttsch_link
, or NULL if it's at the tail of the list.
-
-
One TSCH neighbor has zero or one list of negotiated tx cells.
-
Its head is kept by
tsch_neighbor.negotiated_tx_cell
. -
The
tsch_link
in the list is a Tx TSCH link added by a 6P transaction. -
The list is sorted in descending order of
tsch_link.timeslot
field.
-
-
(list of) reserved cells
-
It's a linked list of
tsch_link
, whoselink_options
== 0. -
tsch_link.data
is either of the following three cases- NULL (the
tsch_link
is the tail of the list) - points to a
tsch_link
whoselink_options
== 0 (that's the next reserved cell in the list) - points to a
tsch_link
whoselink_options
!= 0 (thetsch_link
is the tail (and the head as well), and itsdata
is a negotiated cell to be deleted. Used in RELOCATE transaction)
- NULL (the
-
One TSCH neighbor has zero or one list of reserved cells.
-
It's kept in
tsch_slotframe.links_list
just like othertsch_link
s, but you can filter them bylink_options
and peer address.
-
Because they have significantly different properties from normal tsch_link
s, I would assign them their own type names by typedef
. I would also add source and header files dedicated to them, and define API to manipulate them. That way, it would be easier to prevent unsafe operations and keep the properties above intact.
msf-negotiated-cell: sent callbacks
add_sent_callback
and relocate_sent_callback
are shared by the initiator and the responder. I would define different callback functions for them. That way, we don't have to use the following condition to check if it's for initiator or responder.
NETSTACK_ROUTING.node_is_root() ||
(parent_addr == NULL &&
linkaddr_cmp(dest_addr, parent_addr) == 0)
msf-negotiated-cell: release reserved cells
In add_recv_response
and relocate_recv_response
, there are some cases where the reserved cells are not released. Because the 6P transaction is terminated here, there is no timeout, so I think the reserved cells for that peer remain in those cases.
I think we can clear the reserved cells at the top of add_recv_response
and relocate_recv_response
functions. To do that, we might as well define clear_reserved_cells
as
static void clear_reserved_cells(const linkaddr_t *peer_addr)
{
tsch_link_t *cell;
assert(peer_addr != NULL);
while((cell = find_reserved_cell_head(peer_addr)) != NULL) {
// Maybe `while` loop is too consertative... We can use `if` instead.
release_reserved_cells(cell);
}
}
and replace release_reserved_cells(find_reserved_cell_head(peer_addr))
with clear_reserved_cells(peer_addr)
. clear_reserved_cells
is idempotent, so we can casually call that to release the reserved cells.
In addition, timeout_handler
should clear reserved cells regardless of cmd
. At least if cmd
is RELOCATE, there can be some reserved cells, so we have to clear them.
msf-negotiated-cell: error handling
-
error_handler
is empty. We should at least print a log message. I don't think we have to take more action than that. - "quarantine" (section 12, msf-04) is not implemented. Is it a TODO?
- We might as well handle
RC_ERR_LOCKED
response inadd_recv_response
andrelocate_recv_response
, because we already handleRC_ERR_BUSY
. -
delete_recv_response
should handleRC_ERR_SEQNUM
. - msf-negotiated-cell.c L1528: It should send 6P CLEAR, just like it does at L1244.
Others
- msf.h: It defines four CALLBACK macros. I think we can define them in module-macros.h. That way, we don't have to modify rpl-class, rpl-lite or tsch.h.
- tsch.c:
msf_callback_packet_sent
is not called for TSCH keepalive frames. Is it OK? - tsch.c L1105: If you fix this, maybe you should fix L1098, too.
- msf.c L174:
num_cells_used
is only for the preferred parent, butmsf_negotiated_cell_update_num_cells_used
is called here regardless ofdest_mac_addr
. Is it OK? - msf-negotiated-cell.c L323:
cell_to_add->data
can be NULL here. It's dangerous. -
msf_autonomous_cell_add
: The case where the new AutoTxCell conflicts with the AutoRxCell is not handled. Is it a TODO? - msf-negotiated-cell.c L613: The correct comment is "look for an unused slot", I think.
- msf-negotiated-cell.c L866: The correct condition is
cell_list_len >= sizeof(sixp_pkt_cell_t) * MSF_6P_CELL_LIST_MIN_LEN
, I think. I don't think we need this assertion, becauseadd_send_response
handles that case anyway. - msf-negotiated-cell.c L1166: Here, we should insert
cell = NULL;
. Otherwise, it's possible thatcell != NULL && cell->channel_offset != channel_offset
is true. In that case, L1266 will remove the cell, that is not actually the one requested by 6P DELETE. - msf-negotiated-cell.c L1577: This assertion fails if the node is the responder and it responds with an empty CellList.
-
prepare_cell_list_to_return
: In fact it returns zero or one cell. Maybe we should rename it toprepare_cell_to_return
to match the function signature. - msf-negotiated-cell.h L48:
new_parent
is misleading. It should bepeer_addr
. - msf.h L87, L100: typo: "leavning" -> "leaving"
- msf-negotiated-cell.c L544: We might as well check if
reserved_cell_head->link_options == 0
, just in case. - BTW, I didn't review sax.c. Sorry about that.
Hi Yatch,
Obviously this is a desirable feature and thanks for your hard work. I have not made a full pass on the code, but here are some initial comments / points for discussion. These might already be quite a lot!
Minor quibbles and annoyances:
- I think the init function should print the parameters used by the code, such as the timeout of the MSF
- Contiki-NG coding style reserves
<>
includes only for system libraries; use "" for all Contiki-NG files - the 6tisch minimal schedule has to be enabled (
TSCH_SCHEDULE_WITH_6TISCH_MINIMAL
set to nonzero) for MSF to work, but there is no warning / message when its set to 0. This could be made more user friendly (for example, a compilation warning added) - the code organization in Contiki-NG is not fully clear to me. I think it makes some sense to have MSF under
os\services
, especially as Orchestra is already there, but why is the 6top protocol undernet\mac\tsch
? -
const sixtop_sf_t msf = {
- maybe rename this to something more specific, e.gmsf_scheduling_function
,minimal_scheduling_function
or same such to make it more easily grepable -
LOG_INFO("delay the next request for %u seconds\n", (unsigned int)wait_duration);
- should divide by CLOCK_SECOND
Major issues:
- first of all, the code as provided does not compile on ARM. I'm using
arm-none-eabi-gcc
version 4.9.3 if that makes any difference. - the size of the logging code is an issue when trying to run this code on embedded hardware. I don't know of a particularly good way to solve it. Perhaps own log level for MSF would be a step in that direction. potentially, the logging can be reduced in the future, when the code gets more stable (e.g. move some things from INFO level to DBG level - although a lof of that logging is actually at the ERR level!). -- anyway, it would be nice to get MSF demo application the actually successfully link and work on the CC2650 launchpad and similar platforms!
- the
PACKETBUF_ATTR_PRIORITY
attribute - ok, we need this, but probably not the way it's implemented - by dropping an existing packet from queue. That seems like a hack and is inappropriate for networks aiming for high reliability. - last but not least, the code was hitting assertion errors (an issue was already pointed out by debug-ito). An incorrect protocol behavior on some nodes obviously should not be able to trigger assertions on other nodes. (I'm not sure this is not the case right now - need to look at the code more carefully to conclude what was the reason).
I will send you a diff of the changes that I did to make it compile and not bail out with assertion errors.
@debug-ito @atiselsts Thank you for your comments. I will address them.
Contiki-NG coding style reserves <> includes only for system libraries; use "" for all Contiki-NG files
Oh, I didn't know that. Thank you for pointing it out! 👍
We have published some initial MSF results as part of our paper TSCH Networks for Health IoT: http://www.edi.lv/wp-content/uploads/2019/10/sphere-tsch-acm.pdf (Figure 4).
I must say that the PRs version of MSF is still showing worse results that I would expect for the SPHERE application; though the results are quite different from my last year's micro-implementation of MSF internet draft version 02.
@atiselsts Thank you for sharing! I will look after my autumn holidays.
Hallow Tanaka! i try this msf with my task - chain of 8 nodes -> border-router. it works much worse then old sixtop/node with simple-sf. Where i can give you my simulation project? can you open issues in your github repo?
@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.
can you open issues in your github repo?
It'd be nice if you could test the new one once it is pushed here. Thanks for testing.
@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.
can you open issues in your github repo?
It'd be nice if you could test the new one once it is pushed here. Thanks for testing.
i`ve no rights, to push here anything. provide my test example as PR for you? i was wrong about sixtop/simple-sf - it unpredictable hung cooja simulation, without any hope of diagnse reason. your msf not hungs, but it is loose 3rd-hop node, when works with some little traffic. without traffic it can slowly discover deep chain. and even works with it with little traffic.
@yatch : can you explain about msf_callback_tsch_nbr_removed(nb)
callback?
why not used NETSTACK_CONF_ROUTING_NEIGHBOR_REMOVED_CALLBACK
, like orchestra do?
if it need, why not introduced somewhat TSCH_CONF_ROUTING_NEIGHBOR_REMOVED_CALLBACK
, as more generic way to insert msf callback?
Halow Tanaka! here i've provide some work on sixp, 6Pex. There provided support of concurent transfers, some optimisations, and some fix. Maybe it could interest you.
Hallow Tanaka! i`ve finished developing your MSF service here MSF. now this well schedules net and resolves collosions on deep net chains - that i need. this proposed with:
- module
avoid
establish register for busy cells, that take to account when allocate new cell. new cells are mus be free, and not collision with other cells. - detects collision for autonomous RX cells, and negotiate cells, to provide connection with parent without interferences.
- provides more stable strategys for relocate cells (and allocate new one), providing ability to migrate chanel in same slot. If no free slots have, only such migration allow avoid from interference on busy chanel
- provides NRSF protocol - this one shares busy/avoid cells register over neighbors. So every node have busy cells map over 1-3hop nbrs. And can choose new cells with no conflicts, or detects when need for relocation.
This work now tested only in cooja. now i starts look it in launchpad kits.
There is a lot of codestyle inconsistence with coktiki one. I not position this code as PR, but maybe you interests it solutions, to develop your one.
I can rebuild this changes-tree on contiki-ng develop banch, on demand. current implementation uses tsch stack very close to NG.
Hallow Tanaka!
i've relocate my MSF code on latest contiki-NG here MSF. and NRSF.
msf example builds well.
@alexrayne What is NRSF?
@alexrayne What is NRSF?
provides NRSF protocol - this one shares busy/avoid cells register over neighbors. So every node have busy cells map over 1-3hop nbrs. And can choose new cells with no conflicts, or detects when need for relocation.
i designed it to work in densed environment - where space of nodes ( slots * chanels ) not much, and conflicts are frequent. To resolve nodes (slots * chanels) in such suituation, i spreads maps of allocated nodes between nehgbors, so they can allocate nodes avoiding conflicts.
@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.
can you open issues in your github repo?
It'd be nice if you could test the new one once it is pushed here. Thanks for testing.
@yatch any word yet on the updated code? I might need MSF relatively soon and I could implement it myself but it's probably a better idea to use your latest version as a starting point at least
@tinstructor I think I have no time to do something on this in the coming months. Hope the current version could help.
Hi!
This was my mistake, very sorry about it, re-opening this PR now.
What happened is the following: I was switching the base branch for this repository https://github.com/wittra/contiki-ng from develop to wittra. But accidentally, I made the change on the wrong repo (this repo) and instead of switching the base branch I renamed it. And somehow github deleted develop and closed all PRs...
Many apologies for this mishap 🙏; I haven't contributed in a while.. but now at least everybody got some notification from me :p