click
click copied to clipboard
Assign element to cpu thread
Hello, I am newbie to Click, and i am using click in user-level. I have two questions. Is it possible to:
- Assign an element in a configuration file to a cpu thread
- And second is it possible to pin this thread to a core?
- StaticThreadSched(elementname 1); to pin elementname to thread 1. With --dpdk it's the default. In standard userlevel you have the "-a" flag.
So StaticThreadSched(FromDevice(eth1) 1)?
And when i run the click configuration file, should i run it with -a ?
Check https://github.com/kohler/click/wiki/Language for the language ;)
Well if you you use DPDK element (from the other discussion I guess you want), you need to launch with --dpdk.
For -a, it depends on what you want. Nowadays people advocate for run-to-completion, so you should use -a.
This will give you the click basics, such as naming too : https://github.com/tbarbette/fastclick/wiki/Tutorial
Thank @tbarbette, i did not manage to integrate click with dpdk yet.. I m using just user-level click.
I have a scenario that i run a sink(FastUdpSource -> ToDevice) a vnf(FromDevice->Queue->ToDevice) and a sink(FromDevice->Counter->Discard).. each running on a separate node. And first I want to measure the throughput.. I change the RATE of the packets send per/second. And i count the rate that reach the "sink"
George, I have a vague idea of the issue you are trying to solve with thread pinning and suspect the solution might not work as intended, but I think in theory what you are asking could be achieved with:
cilck -a -j3 sink-sender.click Schedule elements with: StaticThreadSched(elementname1 0, elementname2 0, ... 0);
click -a -j3 vnf-bridge.click Schedule elements with: StaticThreadSched(elementname1 1, elementname2 1, ... 1);
click -a -j3 sink-receiver.click Schedule elements with: StaticThreadSched(elementname1 2, elementname3 2, ... 2);
This assumes each node has access to 3 threads. It also assumes CPU thread 0 on node1 and CPU thread 0 on node 2 etc will be scheduled on the same host CPU thread, which may or may not be the case. Perhaps limiting each node to 1 thread and pinning each node to a specific core on the host would give you more control?
Thank you @ahenning, to be more precise, i want to test 2 different scenarios.
- The FromDevice element and the ToDevice element in the vnf-bridge are in the same core
- In the different core.
So if i understood what you wrote i must write an click configuration file in like StaticThreadSched(FromDevice 1 , To Device 1)?? and how i link them with the queue element?
And after this i have to execute click -a -j3 vnf-bridge.click
Im sorry if im wrong, but im currently start working on click
@p4pe StaticThreadSched takes element names, not a declaration. Please take the time to read the links I gave you :) Looking at the examples in the "conf" folder will help too.
You can give a value to -a to pin at a certain offset. So you can simply use "-a 2 -j 1" and every element of that click will run on core 2. If you want to use multiple cores in a single instance, you can use "-a 2 -j 2" and use StaticThreadSched(elementnameA 0, elementnameB 1); to pin elementnameA and elementnameB to core 2 and 3.
Ok now i think i got it.. Thank you @tbarbette..
I built click with ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread
but when i run click -j config
I had this warning warning: Click was built without multithread support, running single threaded
For the second scenario the config would look something like:
FromDevice -> Queue -> unq1::Unqueue -> ToDevice;
StaticThreadSched(unq1 1);
FromDevice should run on thread 0, and packets pushed to and processed by ToDevice should run on thread 1. If the configuration and elements are more complicated, Click has a home thread function that you could add to your elements to verify if needed.
Not all elements are thread safe, so one way would be to place the elements you want to run on specific core between two queues e.g.
FromDevice -> Queue -> unq0::Unqueue -> SlowPathElement -> Queue -> unq1::Unqueue -> ToDevice;
StaticThreadSched(unq0 1, unq1 0)
This is assuming the whole config is more complicated and the idea is to only run the resource heavy elements on a dedicated core and the rest on thread 0. This info might not be relevant to your use case but I am just adding that here for posterity's sake.
Also, if the element timers need to run on say the same thread 1, then the actual element also needs to be scheduled via StaticThreadSched and not just the pull to push converter like Unqueue.
I appreciate your help @ahenning.. Im playing with this now, but i realize that despite i ran ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread
The click did not built with multithread enabled.. and i m trying to see why
Ok now i think i got it.. Thank you @tbarbette..
I built click with
./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread
but when i run click -j configI had this warning warning: Click was built without multithread support, running single threaded
Did you make clean then make again? Weird.
I did a new installation in new machine, and now it is ok
Hello @tbarbette I m trying to pin elements to threads but i took this Error : router configuration specified twice
My click configuration file is:
FromDevice(enp4s0f1) -> Queue -> ToDevice(enp4s0f0);
StaticThreadSched(FromDevice 1, ToDevice 0);
And I m running click with: click -a 2 -j 2 forwarder.click
Hi,
First, you need to create an instance of From/ToDevice as follows:
in:: FromDevice(enp4s0f1); out :: ToDevice(enp4s0f0);
Then describe your pipeline:
in -> Queue -> out;
Finally, pin each instance to the correct thread:
StaticThreadSched(in 1, out 0);
The way you did it, Click creates a different instance for every From/ToDevice call. This is why the output states that router configuration is specified twice.
Thank you @gkatsikas, but I have the same issue with this click configuration file:
in::FromDevice(enp4s0f1); out::ToDevice(enp4s0f0);
in -> Queue -> out;
StaticThreadSched(in 1, out 0);
I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)
Node1: Source.click
FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234
14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);
Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);
Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)
In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21
Whereas running it with click -a -j 2 the rate is 467960,45
What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?
Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally
How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets
Kindly thank you @tbarbette @ahenning and @gkatsikas for your input
I'd advise to keep "-a" empty, and play with the affinity inside Click. If you want to offset by two, just pin elements to thread 2 and 3. For the performance : without -a you leave the OS switching threads around and it's actually not very good at that.
WIth the forwarder using two cores, I'd expect the sink or source to become the bottleneck. But I'd advise using DPDK as soon as performance matters.
Random advices:
- FastUDPSource also takes a STOP argument that would kill Click when it finishes generating packets, so you're sure you're not taking the rate while forwarding no traffic.
- That or use a LIMIT of -1.
- Similarly, I'm more fond of AverageCounter instead of Counter that will take the rate between the first and last packet. In the FastClick branch it has "link_rate" that gives the rate in bps.
- Check with htop what's the bottleneck on the 3 machines :)
@IoakeimFotoglou we have the same issues i see. @tbarbette I will go too with your advices.
Every time I try to play with the affinity inside the Click I "took" this warning forwarder.click:6: While configuring ‘StaticThreadSched@4 :: StaticThreadSched’: warning: thread preference 2 out of range
I'm using the configuration that @gkatsikas proposed.
Thank you in advance
With configuration **in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);
in -> Queue -> out;
StaticThreadSched(in 0, out 1);**
and click -a -j 2 forwarder.click is working fine
With configuration
**in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);
in -> Queue -> out;
StaticThreadSched(in 2, out 3);**
and click -a j 2 forwarder.click
I have the issue i mentioned above.
Well, in the second configuration you explicitly ask for threads 2 and 3 in StaticThreadSched, but you call Click with only 2 cores (i.e., j 2 --> which implies that threads 0 and 1 will be allocated). If you bump j to 4 instead of 2 it should work.
Obviously I did not understand something correctly.
What I had understood so far is: If I have StaticThreadSched(in 0, out 0) means that i have two threads running in on core(core0)
If I have StaticThreadSched(in 0, out 1) this means that i ask for two threads (0, 1) and with -a -j 2 in the call, this configuration runs on the cores 0 and 1.
The "conflict" comes when i tried to run click in different cores (0,1). I configure StaticThreadSched(in 2, out 3) and i thought that these means that the click will run in cores 2 and 3.
Kindly thank you for your input and advices @gkatsikas
No, the thread index in StaticThreadSched does not necessarily correspond to a physical CPU core ID, it is simply a thread count. To have full control on how to pin those threads to a physical core, I suggest that you use the DPDK-based FastClick instead of Click.
Ok.. Thanks for the explanation. I want to use Click first.
So what do you suggest for better management of core pinning? Maybe the use "taskset" command ?
If I want 2 threads in one core i will have StaticThreadSched(in 0, out 0)
Click -j 2
Your first two points were correct. I think what you missed is that a Click thread can run multiple elements. It's like user-level threads. So with
StaticThreadSched(in 0, out 0)
You pin the two elements to thread 0. As you pass -a, threads 0 means core 0. Core 1 is there but does nothing. Similarly you can pass -j 4 and assign thread 2 and 3 to the in and out elements. 0 and 1 will do nothing. It is not correct to assign elements to thread 2 and 3 if you launched click with 2 threads, as 3 is an out of bound index. That is the error you get.
Taskset will not work because if two elements are on the same thread there is nothing you can do about it.
For completeness, -a takes a parameter that allows to offset the assignation of threads to core. With -a 2, thread 0 will be pinned to core 2, while thread 1 to core 3. So in that case you would pin in and out to thread 0 and 1 which will be running on core 2 and 3.
What DPDK gives is the ability to further define a list of core so if you pass, 3,7,10 thread 0 would be pinned to core 3, 1 to 7 and 2 to 10.
My suggestion would be to run click with -a -j 16 if you have 16 cores and never think about this anymore. You pin elements to thread indexes that are exactly cores. if a core has nothing assigned to it then it won't run anything, you don't care really...
Ok now I think that I get it.
If i want to run FromDevice and ToDevice in two different threads, and assign these threads to different cores that are on different sockets.
If we assume that core2 and core4 are on different socket. The configuration will be
StaticThreadSched(in 2, out 4)
And with click -a -j 16
Thank you @tbarbette
You configuration should work even with -j 5. Note that pinning a FromDevice to socket 0 and ToDevice to socket 1 will imply inter-socket communication, which is costly in terms of performance. In your case it does not matter though, as you use the vanilla Click, which can hardly stress QPI.
I know @gkatsikas, this performance degradation I want to observe!
I have 4 different scenarios
- the FromDevice and ToDevice running in the same core without hyperthreading
- in the same core with ht 3)Different core in the same socket
- Diffent core different socket.
If Im right the (4) scenario will have the worst performance(more ore less) due to the inter-socket communication
Yes, this is likely the case, although QPI effects may be obscured by some artefacts of your setup, such a mem copy from/to user-space. You may also try kernel-based Click or fast user-space Click (with DPDK) to eliminate this overhead.
Unfortunately I did not manage to install kernel-based Click(I think that is not compatible with new linux headers). Next step is to try Click with DPDK, but first I have to take a look at DPDK cause I am newbie.
Last question just for confirmation. For my first scenario I just have
in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0); in -> Queue -> out;
without StaticThreadSchead
And just run click forwarder.click
Yes (provided that you have disabled HT)
I'd say to always pin them, even for case 1.
Also you have to consider that without DPDK you're not pinning actually most of the RX work. Packets will be received by the kernel on probably all cores (the default for the NIC is to use as many queues as cores) through the interrupts handler, no matter how the application is pinned. They will go through the kernel stack on all those cores, this is some heavy work, before the app reads the packets from a single given core. Therefore if you really want to test QPI with the kernel sockets, you'll need to consider the number of queues (ethtool -L) and irq affinity.
Just a thought : similarly as your device is attached to a specific CPU, the packets will actually never be moved to the second core in the setup you present, just the Click metadata. You may want to "touch" the bytes on the second CPU. "CheckIPHeader -> SetTCP(or UDP)Checksum should do the trick.
Thank you all for your help guys.
I (believe) that i manage to install the fastclick and know I will try to run the same "experiment" and see the difference.
if I understood well the only changes that I have to do is to replace ToDevice and FromDevice with ToDPDKDevice and FromDPDK device, with the interfaces that are binded with the DPDK, and after I run click with click --dpdk
Mostly yes. You don't need a Queue also ;)
I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)
Node1: Source.click
FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234
14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);
Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);
Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)
In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21
Whereas running it with click -a -j 2 the rate is 467960,45
What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?
Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally
How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets
Kindly thank you @tbarbette @ahenning and @gkatsikas for your input
Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource?
Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) .
My topology is source--->VNF--->sink
Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) . My topology is source--->VNF--->sink Στις Σάβ, 16 Ιαν 2021, 10:32 π.μ. ο χρήστης Memtwo [email protected] έγραψε: … I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink) Node1: Source.click FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234 14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1); Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0); Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop) In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21 Whereas running it with click -a -j 2 the rate is 467960,45 What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate? Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets Kindly thank you @tbarbette https://github.com/tbarbette @ahenning https://github.com/ahenning and @gkatsikas https://github.com/gkatsikas for your input Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#467 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AINNSS7PY5EEQB64JON5ZY3S2FFJTANCNFSM4SYKBEOA .
Thanks! But what script runs on your node2(VNF)? Simple in -> Queue -> out ? How does the packet sent to node2 by node1, then sent to node3 by node 2? Sorry I'm newbie to click and those network problems.
Yes just FD->Queue->TD.
You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.
Yes just FD->Queue->TD.
You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.
But it seems my packets are directly sent to node3 by node1 and ignore node2.
If you run tcpdump on ingress port of node2, what did you take?
Oh use tcpdump can see the packet from node1 to node3 I think it works. Thank you very much!
You are welcome! You can also use the IPPrint element, for checking.
sorry for trouble you after a long time. I am still newbie to Click and doubt my setup doesn't work well here's my setup: Node1 (source) --> Node2(forward) --> Node3 (sink) I want my packets are transmitted hop by hop
Three nodes' information is as follows: Node1 ens33:192.128.32.128 00:0c:29:92:68:92 ; ens38:192.168.32.129 00:0c:29:92:68:9c Node2 ens33:192.168.32.130 00:0c:29:57:e6:e1 ; ens38:192.168.32.131 00:0c:29:57:e6:eb Node3 ens33:192.168.32.132 00:0c:29:db:6e:56 ; ens38:192.168.32.133 00:0c:29:db:6e:60
Node1: Source.click
FastUDPSource(800000, -1, 60, 00:0c:29:92:68:92, 192.168.32.128, 1234, 00:0c:29:db:6e:56, 192.168.32.132, 1234) ->IPPrint("Hello") ->ToDevice(ens33);
Node2: Forward.click
FromDevice(ens38, PROMISC true) -> Queue -> IPPrint -> ToDevice(ens33);
Node3: Sink.click
FromDevice(ens33, PROMISC true) -> c:: Counter -> Discard; Script(wait 10, print c.rate, loop);
I can see the result on node3. I use tcpdump on node2 and see the packet 192.168.32.128.1234 > 192.168.32.132.1234, but use IPPrint element see nothing. How can I make sure that the packet are sent to node2 from node1, then sent to node3 from node2. Sorry again for bothering you.
Hello, I think you have to rewrite the mac in every node that a packet arrives.
Thanks! Do you mean I should set node1's DST to node2, then on node2 use EtherRewrite element and set it's DST to node3?