FreeRTOS-Plus-TCP icon indicating copy to clipboard operation
FreeRTOS-Plus-TCP copied to clipboard

[RFC] Add priority queues

Open go2sh opened this issue 1 year ago • 5 comments

Add priority queues.

Description

This PR adds priorities to socket and packets and multiple event queues to handle packets. This change enables multiple things with in the time sensitive networking space:

  • The stack can handle packets based on their priority to allow overtakes or idle background traffic using the remaining bandwidth. An example would be audio/video data or PTP message as high priority or a data dump in the background.
  • The packet priority can be used by network drivers to assign the packet to a different hardware queue.
  • The HW can add VLAN tags with priority fields. (And maybe later FreeRTOS-Plus-TCP :) )

Test Steps

Enable the feature via ipconfigPRIORITIES && ipconfigEVENT_QUEUES and add two udp sockets. The first socket gets a higher priority (e.g. 5). The second socket sends a lot of packets and the first socket afterwards one. This packet should overtake some of the packets of the second socket.

Checklist:

  • [X] I have tested my changes. No regression in existing tests.
  • [ ] I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

#894

TODOs

  • [ ] Make TCP sockets adhere the priority. This would basicly mean, that a TCP packet before sending must be placed back into the Queues and cannot be send directly.
  • [ ] Add unit tests for the new features.
  • [ ] Add a distinction between packet and socket priorities including separate mapping.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

go2sh avatar Jun 15 '23 20:06 go2sh

A general comment about priorities and queues:

  • The number of priorities is chosen based on the VLAN Tag PCP field with 3 bit. (See https://en.wikipedia.org/wiki/IEEE_P802.1p) Other possibilities are TOS field from IP header (4 bit / 16 numbers) or simply the number queues. Its really application dependent. I think the most useful is the VLAN PCP variant with 8. In my case: The NetworkInterface (a Infineon TC3 and TC4 port) inserts and removes a vlan tag via the descriptors and the PCP value is used to assign dedicated tx and rx hardware queues. Also the PCP field for RX packets is written as priority into the NetworkBufferDescriptor_t field of a received packet overwriting the default value and thus the priority handling of rx events.
  • The number of queues is chosen based on the default number 3 from the tc-prio scheduler of the linux kernel. (https://www.man7.org/linux/man-pages/man8/tc-prio.8.html) The idea behind 3 is: One queue for best effort. No garuantees at all. Gets what is left. One queue for normal app traffic like HTTP/FTP/MQTT etc. One queue for high priority traffic like low latency traffic or real time traffic. (e.g. Audio/Video, Real-Time Sense and Control or Time Sync). The number of queues might be equal to the number of priorities, but for most applications the number of packets with a high priority are rather small compared to the rest of the traffic. So more queues might waste resources, since only a few messages are stored in there.

I don't mind the numbers at all. A just wanted to choose a reasonable default, since anybody can overwrite it relatively easy.

A comment on the application: I work at Infineon and I port FreeRTOS+TCP on our TC3 and TC4 microcontroller. These are rather powerful devices with a lot of cores (up to 6), a lot of SRAM (a few MB) and up to 5G Ethernet MACs with 8 DMA and 8 Hardware Queues each for RX and TX). We run relativly complex applications with multiple tasks working on network traffic with diffrent application domains. From SOTA, CAN Tunneling, Service Handling (like SOMEIP or MQTT) and real time data like Audio and Video (or Radar) streams. These patch is a first step off removing some bottlenecks with the stack and to fulfill the real-time requirements.

As an concrete example: I have an app which runs some low priority Debug Instrumentation, an MQTT Client and some Audio Processing. Since the system has a lot of ram, the number of NetworkBuffers is high (eg. 128). This lead to the instrumentaion filling up the single queue and the audio task had no chance of meeting the real time latency requirement. With this multiple queue approach: The instrumentation runs in the lowest priority and gets rest of the bandwith. Its no problem if a packets waits a bit. The overall bandwidth of the system is sufficient. The MQTT traffic runs in the middle priority and the audio and timesync traffic in the highest. (Only a few 100 kBit/s but very latency sensitive). With this multiple queues approach, I don't need take any measures within the software to meet my real-time goals.

I hope this makes this a bit more clear. :)

go2sh avatar Jun 16 '23 15:06 go2sh

One more comment regarding: RX Events. The problem why I also added the priorities for RX events is the following: Think about a lot of NetworkBuffers and your IP Tasks start blocking until a descriptor entry is free inside the descriptor table. During that time no RX packet handling will happen as the IP task is blocking. The high priority queue makes it possible, that an high priority packet is processed a bit earlier than otherwise. The ideal solution would be to split RX and TX Packet processing. Then no extra queues for RX would be needed, as the traffic comes in prioritized already.

go2sh avatar Jun 16 '23 19:06 go2sh

Thanks Christoph for creating the PR. We will discuss internally more on the design and , if needed, I will synch up with you as well more on the design part and take the change forward. Please note that the actual merge might happen only in August. We have an important GA release coming up and we will be freezing the branch in a week. However, we will make sure that we merge the change as soon as the release tagging is done. Thanks for the patience.

shubnil avatar Jun 17 '23 04:06 shubnil

A general comment about priorities and queues:

  • The number of priorities is chosen based on the VLAN Tag PCP field with 3 bit. (See https://en.wikipedia.org/wiki/IEEE_P802.1p) Other possibilities are TOS field from IP header (4 bit / 16 numbers) or simply the number queues. Its really application dependent. I think the most useful is the VLAN PCP variant with 8. In my case: The NetworkInterface (a Infineon TC3 and TC4 port) inserts and removes a vlan tag via the descriptors and the PCP value is used to assign dedicated tx and rx hardware queues. Also the PCP field for RX packets is written as priority into the NetworkBufferDescriptor_t field of a received packet overwriting the default value and thus the priority handling of rx events.
  • The number of queues is chosen based on the default number 3 from the tc-prio scheduler of the linux kernel. (https://www.man7.org/linux/man-pages/man8/tc-prio.8.html) The idea behind 3 is: One queue for best effort. No garuantees at all. Gets what is left. One queue for normal app traffic like HTTP/FTP/MQTT etc. One queue for high priority traffic like low latency traffic or real time traffic. (e.g. Audio/Video, Real-Time Sense and Control or Time Sync). The number of queues might be equal to the number of priorities, but for most applications the number of packets with a high priority are rather small compared to the rest of the traffic. So more queues might waste resources, since only a few messages are stored in there.

I don't mind the numbers at all. A just wanted to choose a reasonable default, since anybody can overwrite it relatively easy.

A comment on the application: I work at Infineon and I port FreeRTOS+TCP on our TC3 and TC4 microcontroller. These are rather powerful devices with a lot of cores (up to 6), a lot of SRAM (a few MB) and up to 5G Ethernet MACs with 8 DMA and 8 Hardware Queues each for RX and TX). We run relativly complex applications with multiple tasks working on network traffic with diffrent application domains. From SOTA, CAN Tunneling, Service Handling (like SOMEIP or MQTT) and real time data like Audio and Video (or Radar) streams. These patch is a first step off removing some bottlenecks with the stack and to fulfill the real-time requirements.

As an concrete example: I have an app which runs some low priority Debug Instrumentation, an MQTT Client and some Audio Processing. Since the system has a lot of ram, the number of NetworkBuffers is high (eg. 128). This lead to the instrumentaion filling up the single queue and the audio task had no chance of meeting the real time latency requirement. With this multiple queue approach: The instrumentation runs in the lowest priority and gets rest of the bandwith. Its no problem if a packets waits a bit. The overall bandwidth of the system is sufficient. The MQTT traffic runs in the middle priority and the audio and timesync traffic in the highest. (Only a few 100 kBit/s but very latency sensitive). With this multiple queues approach, I don't need take any measures within the software to meet my real-time goals.

I hope this makes this a bit more clear. :)

@go2sh thoughts on mq-prio?

amazonKamath avatar Jun 19 '23 16:06 amazonKamath

@go2sh thoughts on mq-prio?

That is my final step of my TSN chain.

I would see this functionality as part of the network interface. In my case, the driver checks the NetworkBuffer ucPriority and has a custom mapping array for mapping the packets to the different DMA queues. But each devices does it differently and I know that alot of other devices support multiple hw queues, which are already part of FreeRTOS+TCP.

go2sh avatar Jun 20 '23 10:06 go2sh