contiki-ng icon indicating copy to clipboard operation
contiki-ng copied to clipboard

6TiSCH Minimal Scheduling Function (MSF)

Open yatch opened this issue 5 years ago • 17 comments

This PR implements MSF on top of Issue #987. All the features except for "negotiated Rx cell" are supported. Here are diffs between the two branches: https://github.com/yatch/contiki-ng/compare/pr/sixtop-update...yatch:pr/msf

This PR also have changes to put a "priority" packet to the top of TX queue.

tests/16-6tisch-msf/01-cooja-test-msf-base.csc and tests/16-6tisch-msf/02-cooja-test-msf-relocate.csc would demonstrate how it works.

This is under testing and evaluation.

yatch avatar Jul 05 '19 20:07 yatch

Thanks for great work! I reviewed the code (https://github.com/yatch/contiki-ng/tree/9a285a08850c8f4a64542f5d9784b6630bf64169). Here are my comments.

About PACKETBUF_ATTR_PRIORITY

PACKETBUF_ATTR_PRIORITY attribute is added. 6P frames set this flag, and tsch-queue treats them prioritized. This behavior is enabled whether or not msf is used.

Is there any specification that prioritizes 6P frames over other frames? I think I have seen it somewhere, but cannot remember it...

msf-negotiated-cell: two new data models

msf-negotiated-cell introduces two data models:

  • (list of) negotiated tx cells

    • It's a linked list alternating between tsch_link and msf_negotiated_tx_cell_data_t.

      • tsch_link.data points to msf_negotiated_tx_cell_data_t.
      • msf_negotiated_tx_cell_data_t.next points to the next tsch_link, or NULL if it's at the tail of the list.
    • One TSCH neighbor has zero or one list of negotiated tx cells.

    • Its head is kept by tsch_neighbor.negotiated_tx_cell.

    • The tsch_link in the list is a Tx TSCH link added by a 6P transaction.

    • The list is sorted in descending order of tsch_link.timeslot field.

  • (list of) reserved cells

    • It's a linked list of tsch_link, whose link_options == 0.

    • tsch_link.data is either of the following three cases

      1. NULL (the tsch_link is the tail of the list)
      2. points to a tsch_link whose link_options == 0 (that's the next reserved cell in the list)
      3. points to a tsch_link whose link_options != 0 (the tsch_link is the tail (and the head as well), and its data is a negotiated cell to be deleted. Used in RELOCATE transaction)
    • One TSCH neighbor has zero or one list of reserved cells.

    • It's kept in tsch_slotframe.links_list just like other tsch_links, but you can filter them by link_options and peer address.

Because they have significantly different properties from normal tsch_links, I would assign them their own type names by typedef. I would also add source and header files dedicated to them, and define API to manipulate them. That way, it would be easier to prevent unsafe operations and keep the properties above intact.

msf-negotiated-cell: sent callbacks

add_sent_callback and relocate_sent_callback are shared by the initiator and the responder. I would define different callback functions for them. That way, we don't have to use the following condition to check if it's for initiator or responder.

NETSTACK_ROUTING.node_is_root() ||
 (parent_addr == NULL &&
  linkaddr_cmp(dest_addr, parent_addr) == 0)

msf-negotiated-cell: release reserved cells

In add_recv_response and relocate_recv_response, there are some cases where the reserved cells are not released. Because the 6P transaction is terminated here, there is no timeout, so I think the reserved cells for that peer remain in those cases.

I think we can clear the reserved cells at the top of add_recv_response and relocate_recv_response functions. To do that, we might as well define clear_reserved_cells as

static void clear_reserved_cells(const linkaddr_t *peer_addr)
{
  tsch_link_t *cell;
  assert(peer_addr != NULL);
  while((cell = find_reserved_cell_head(peer_addr)) != NULL) {
    // Maybe `while` loop is too consertative... We can use `if` instead.
    release_reserved_cells(cell);
  }
}

and replace release_reserved_cells(find_reserved_cell_head(peer_addr)) with clear_reserved_cells(peer_addr). clear_reserved_cells is idempotent, so we can casually call that to release the reserved cells.

In addition, timeout_handler should clear reserved cells regardless of cmd. At least if cmd is RELOCATE, there can be some reserved cells, so we have to clear them.

msf-negotiated-cell: error handling

  • error_handler is empty. We should at least print a log message. I don't think we have to take more action than that.
  • "quarantine" (section 12, msf-04) is not implemented. Is it a TODO?
  • We might as well handle RC_ERR_LOCKED response in add_recv_response and relocate_recv_response, because we already handle RC_ERR_BUSY.
  • delete_recv_response should handle RC_ERR_SEQNUM.
  • msf-negotiated-cell.c L1528: It should send 6P CLEAR, just like it does at L1244.

Others

  • msf.h: It defines four CALLBACK macros. I think we can define them in module-macros.h. That way, we don't have to modify rpl-class, rpl-lite or tsch.h.
  • tsch.c: msf_callback_packet_sent is not called for TSCH keepalive frames. Is it OK?
  • tsch.c L1105: If you fix this, maybe you should fix L1098, too.
  • msf.c L174: num_cells_used is only for the preferred parent, but msf_negotiated_cell_update_num_cells_used is called here regardless of dest_mac_addr. Is it OK?
  • msf-negotiated-cell.c L323: cell_to_add->data can be NULL here. It's dangerous.
  • msf_autonomous_cell_add: The case where the new AutoTxCell conflicts with the AutoRxCell is not handled. Is it a TODO?
  • msf-negotiated-cell.c L613: The correct comment is "look for an unused slot", I think.
  • msf-negotiated-cell.c L866: The correct condition is cell_list_len >= sizeof(sixp_pkt_cell_t) * MSF_6P_CELL_LIST_MIN_LEN, I think. I don't think we need this assertion, because add_send_response handles that case anyway.
  • msf-negotiated-cell.c L1166: Here, we should insert cell = NULL;. Otherwise, it's possible that cell != NULL && cell->channel_offset != channel_offset is true. In that case, L1266 will remove the cell, that is not actually the one requested by 6P DELETE.
  • msf-negotiated-cell.c L1577: This assertion fails if the node is the responder and it responds with an empty CellList.
  • prepare_cell_list_to_return: In fact it returns zero or one cell. Maybe we should rename it to prepare_cell_to_return to match the function signature.
  • msf-negotiated-cell.h L48: new_parent is misleading. It should be peer_addr.
  • msf.h L87, L100: typo: "leavning" -> "leaving"
  • msf-negotiated-cell.c L544: We might as well check if reserved_cell_head->link_options == 0, just in case.
  • BTW, I didn't review sax.c. Sorry about that.

debug-ito avatar Jul 10 '19 02:07 debug-ito

Hi Yatch,

Obviously this is a desirable feature and thanks for your hard work. I have not made a full pass on the code, but here are some initial comments / points for discussion. These might already be quite a lot!

Minor quibbles and annoyances:

  • I think the init function should print the parameters used by the code, such as the timeout of the MSF
  • Contiki-NG coding style reserves <> includes only for system libraries; use "" for all Contiki-NG files
  • the 6tisch minimal schedule has to be enabled (TSCH_SCHEDULE_WITH_6TISCH_MINIMAL set to nonzero) for MSF to work, but there is no warning / message when its set to 0. This could be made more user friendly (for example, a compilation warning added)
  • the code organization in Contiki-NG is not fully clear to me. I think it makes some sense to have MSF under os\services, especially as Orchestra is already there, but why is the 6top protocol under net\mac\tsch?
  • const sixtop_sf_t msf = { - maybe rename this to something more specific, e.g msf_scheduling_function, minimal_scheduling_function or same such to make it more easily grepable
  • LOG_INFO("delay the next request for %u seconds\n", (unsigned int)wait_duration); - should divide by CLOCK_SECOND

Major issues:

  • first of all, the code as provided does not compile on ARM. I'm using arm-none-eabi-gcc version 4.9.3 if that makes any difference.
  • the size of the logging code is an issue when trying to run this code on embedded hardware. I don't know of a particularly good way to solve it. Perhaps own log level for MSF would be a step in that direction. potentially, the logging can be reduced in the future, when the code gets more stable (e.g. move some things from INFO level to DBG level - although a lof of that logging is actually at the ERR level!). -- anyway, it would be nice to get MSF demo application the actually successfully link and work on the CC2650 launchpad and similar platforms!
  • the PACKETBUF_ATTR_PRIORITY attribute - ok, we need this, but probably not the way it's implemented - by dropping an existing packet from queue. That seems like a hack and is inappropriate for networks aiming for high reliability.
  • last but not least, the code was hitting assertion errors (an issue was already pointed out by debug-ito). An incorrect protocol behavior on some nodes obviously should not be able to trigger assertions on other nodes. (I'm not sure this is not the case right now - need to look at the code more carefully to conclude what was the reason).

I will send you a diff of the changes that I did to make it compile and not bail out with assertion errors.

atiselsts avatar Aug 29 '19 16:08 atiselsts

@debug-ito @atiselsts Thank you for your comments. I will address them.

Contiki-NG coding style reserves <> includes only for system libraries; use "" for all Contiki-NG files

Oh, I didn't know that. Thank you for pointing it out! 👍

yatch avatar Aug 29 '19 16:08 yatch

We have published some initial MSF results as part of our paper TSCH Networks for Health IoT: http://www.edi.lv/wp-content/uploads/2019/10/sphere-tsch-acm.pdf (Figure 4).

I must say that the PRs version of MSF is still showing worse results that I would expect for the SPHERE application; though the results are quite different from my last year's micro-implementation of MSF internet draft version 02.

atiselsts avatar Oct 28 '19 14:10 atiselsts

@atiselsts Thank you for sharing! I will look after my autumn holidays.

yatch avatar Oct 29 '19 01:10 yatch

Hallow Tanaka! i try this msf with my task - chain of 8 nodes -> border-router. it works much worse then old sixtop/node with simple-sf. Where i can give you my simulation project? can you open issues in your github repo?

alexrayne avatar Apr 15 '20 16:04 alexrayne

@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.

can you open issues in your github repo?

It'd be nice if you could test the new one once it is pushed here. Thanks for testing.

yatch avatar Apr 15 '20 23:04 yatch

@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.

can you open issues in your github repo?

It'd be nice if you could test the new one once it is pushed here. Thanks for testing.

i`ve no rights, to push here anything. provide my test example as PR for you? i was wrong about sixtop/simple-sf - it unpredictable hung cooja simulation, without any hope of diagnse reason. your msf not hungs, but it is loose 3rd-hop node, when works with some little traffic. without traffic it can slowly discover deep chain. and even works with it with little traffic.

alexrayne avatar Apr 15 '20 23:04 alexrayne

@yatch : can you explain about msf_callback_tsch_nbr_removed(nb) callback? why not used NETSTACK_CONF_ROUTING_NEIGHBOR_REMOVED_CALLBACK, like orchestra do? if it need, why not introduced somewhat TSCH_CONF_ROUTING_NEIGHBOR_REMOVED_CALLBACK, as more generic way to insert msf callback?

alexrayne avatar May 14 '20 09:05 alexrayne

Halow Tanaka! here i've provide some work on sixp, 6Pex. There provided support of concurent transfers, some optimisations, and some fix. Maybe it could interest you.

alexrayne avatar Jun 04 '20 16:06 alexrayne

Hallow Tanaka! i`ve finished developing your MSF service here MSF. now this well schedules net and resolves collosions on deep net chains - that i need. this proposed with:

  • module avoid establish register for busy cells, that take to account when allocate new cell. new cells are mus be free, and not collision with other cells.
  • detects collision for autonomous RX cells, and negotiate cells, to provide connection with parent without interferences.
  • provides more stable strategys for relocate cells (and allocate new one), providing ability to migrate chanel in same slot. If no free slots have, only such migration allow avoid from interference on busy chanel
  • provides NRSF protocol - this one shares busy/avoid cells register over neighbors. So every node have busy cells map over 1-3hop nbrs. And can choose new cells with no conflicts, or detects when need for relocation.

This work now tested only in cooja. now i starts look it in launchpad kits.

There is a lot of codestyle inconsistence with coktiki one. I not position this code as PR, but maybe you interests it solutions, to develop your one.

I can rebuild this changes-tree on contiki-ng develop banch, on demand. current implementation uses tsch stack very close to NG.

alexrayne avatar Jul 02 '20 12:07 alexrayne

Hallow Tanaka! i've relocate my MSF code on latest contiki-NG here MSF. and NRSF.
msf example builds well.

alexrayne avatar Sep 28 '21 23:09 alexrayne

@alexrayne What is NRSF?

atiselsts avatar Sep 29 '21 06:09 atiselsts

@alexrayne What is NRSF?

provides NRSF protocol - this one shares busy/avoid cells register over neighbors. So every node have busy cells map over 1-3hop nbrs. And can choose new cells with no conflicts, or detects when need for relocation.

i designed it to work in densed environment - where space of nodes ( slots * chanels ) not much, and conflicts are frequent. To resolve nodes (slots * chanels) in such suituation, i spreads maps of allocated nodes between nehgbors, so they can allocate nodes avoiding conflicts.

alexrayne avatar Sep 29 '21 11:09 alexrayne

@alexrayne This code is outdated. A better implementation is on my machine, which is waiting to be evaluated.

can you open issues in your github repo?

It'd be nice if you could test the new one once it is pushed here. Thanks for testing.

@yatch any word yet on the updated code? I might need MSF relatively soon and I could implement it myself but it's probably a better idea to use your latest version as a starting point at least

tinstructor avatar Jul 07 '22 14:07 tinstructor

@tinstructor I think I have no time to do something on this in the coming months. Hope the current version could help.

yatch avatar Jul 07 '22 22:07 yatch

Hi!

This was my mistake, very sorry about it, re-opening this PR now.

What happened is the following: I was switching the base branch for this repository https://github.com/wittra/contiki-ng from develop to wittra. But accidentally, I made the change on the wrong repo (this repo) and instead of switching the base branch I renamed it. And somehow github deleted develop and closed all PRs...

Many apologies for this mishap 🙏; I haven't contributed in a while.. but now at least everybody got some notification from me :p

simonduq avatar Aug 08 '23 16:08 simonduq