athp icon indicating copy to clipboard operation
athp copied to clipboard

memory modified after free / taskq node_alloc/free{_cb}

Open bzfbd opened this issue 4 years ago • 3 comments

There are cases when node_alloc() and node_free() are called but the node_alloc_cd() has not run yet. This can lead to node_free() freeing the *ni before the node_alloc_cb() runs which then modifies memory (in this case the "is_in_peer_table") after free (in addition to inserting state which is wrong).

To address this do multiple things:

  • change the is_in_peer_table from int to bool as that is what it is and use a macro and ath10k_dbg() to track state changes. This was mostly for debugging.
  • While here move athp_node_alloc_state into if_athp_main.c as it is not used anywhere outside the function. Helps understanding the code.
  • Store the allocated taskq entry in node_alloc() in "alloc_taskq_e" so we can check it during node_free() and cancel the callback if needed.
  • Rework parts of the athp_taskq to allow us to reliably cancel the entry: ad d a second list where we put the entries we are executing. Walking that list can be done lock-less. Add a athp_taskq_cancel() function which will either take the entry of the taskq or wait that no entries are run before returning. While there, remove extra () around the locking macros, remove an early (extra) on_queue = 1 in athp_taskq_queue() and fold some print lines into less vertical space.

Fixes Issue #28.

Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")

bzfbd avatar Jun 02 '20 11:06 bzfbd

Should also compile now after adding the missing '"'.

bzfbd avatar Jun 04 '20 11:06 bzfbd

I'm going to try splitting up the node allocate and the node initialisation bits in net80211 to try and aide in this. At least then we can take a reference on the node.

I still think we'll need some flavour of what you're doing because of how the temporary node stuff is implemented (ie they're immediately deleted!) and how that could lead to some fun stuff in back to back node creation/deletion pulling the peer out of the firmware at the wrong time. It turns out if you screw THAT up then the firmware transmit path hangs...

erikarn avatar Jun 11 '20 22:06 erikarn

I'd suggest applying it first and then separating it out and not doing it both in one go. I am not quite there yet to do all the net80211 direct handling. If we are going to specific "linux-style" calls from net80211 into the driver I think it would be advantageous to keep them as similar in name, arguments, and behaviour as we can, as all that will mean less fiddling when porting drivers.

bzfbd avatar Jun 12 '20 15:06 bzfbd