click Assign element to cpu thread

Hello, I am newbie to Click, and i am using click in user-level. I have two questions. Is it possible to:

Assign an element in a configuration file to a cpu thread
And second is it possible to pin this thread to a core?

Oct 20 '20 12:10 p4pe

StaticThreadSched(elementname 1); to pin elementname to thread 1. With --dpdk it's the default. In standard userlevel you have the "-a" flag.

Oct 20 '20 16:10 tbarbette

So StaticThreadSched(FromDevice(eth1) 1)?

And when i run the click configuration file, should i run it with -a ?

Oct 20 '20 16:10 p4pe

Check https://github.com/kohler/click/wiki/Language for the language ;)

Well if you you use DPDK element (from the other discussion I guess you want), you need to launch with --dpdk.

For -a, it depends on what you want. Nowadays people advocate for run-to-completion, so you should use -a.

This will give you the click basics, such as naming too : https://github.com/tbarbette/fastclick/wiki/Tutorial

Oct 20 '20 16:10 tbarbette

Thank @tbarbette, i did not manage to integrate click with dpdk yet.. I m using just user-level click.

I have a scenario that i run a sink(FastUdpSource -> ToDevice) a vnf(FromDevice->Queue->ToDevice) and a sink(FromDevice->Counter->Discard).. each running on a separate node. And first I want to measure the throughput.. I change the RATE of the packets send per/second. And i count the rate that reach the "sink"

Oct 20 '20 17:10 p4pe

George, I have a vague idea of the issue you are trying to solve with thread pinning and suspect the solution might not work as intended, but I think in theory what you are asking could be achieved with:

cilck -a -j3 sink-sender.click Schedule elements with: StaticThreadSched(elementname1 0, elementname2 0, ... 0);

click -a -j3 vnf-bridge.click Schedule elements with: StaticThreadSched(elementname1 1, elementname2 1, ... 1);

click -a -j3 sink-receiver.click Schedule elements with: StaticThreadSched(elementname1 2, elementname3 2, ... 2);

This assumes each node has access to 3 threads. It also assumes CPU thread 0 on node1 and CPU thread 0 on node 2 etc will be scheduled on the same host CPU thread, which may or may not be the case. Perhaps limiting each node to 1 thread and pinning each node to a specific core on the host would give you more control?

Oct 21 '20 10:10 ahenning

Thank you @ahenning, to be more precise, i want to test 2 different scenarios.

The FromDevice element and the ToDevice element in the vnf-bridge are in the same core
In the different core.

So if i understood what you wrote i must write an click configuration file in like StaticThreadSched(FromDevice 1 , To Device 1)?? and how i link them with the queue element?

And after this i have to execute click -a -j3 vnf-bridge.click

Im sorry if im wrong, but im currently start working on click

Oct 21 '20 11:10 p4pe

@p4pe StaticThreadSched takes element names, not a declaration. Please take the time to read the links I gave you :) Looking at the examples in the "conf" folder will help too.

You can give a value to -a to pin at a certain offset. So you can simply use "-a 2 -j 1" and every element of that click will run on core 2. If you want to use multiple cores in a single instance, you can use "-a 2 -j 2" and use StaticThreadSched(elementnameA 0, elementnameB 1); to pin elementnameA and elementnameB to core 2 and 3.

Oct 21 '20 14:10 tbarbette

Ok now i think i got it.. Thank you @tbarbette..

I built click with ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread but when i run click -j config

I had this warning warning: Click was built without multithread support, running single threaded

Oct 21 '20 14:10 p4pe

For the second scenario the config would look something like:

FromDevice -> Queue -> unq1::Unqueue -> ToDevice; StaticThreadSched(unq1 1);

FromDevice should run on thread 0, and packets pushed to and processed by ToDevice should run on thread 1. If the configuration and elements are more complicated, Click has a home thread function that you could add to your elements to verify if needed.

Not all elements are thread safe, so one way would be to place the elements you want to run on specific core between two queues e.g.

FromDevice -> Queue -> unq0::Unqueue -> SlowPathElement -> Queue -> unq1::Unqueue -> ToDevice; StaticThreadSched(unq0 1, unq1 0)

This is assuming the whole config is more complicated and the idea is to only run the resource heavy elements on a dedicated core and the rest on thread 0. This info might not be relevant to your use case but I am just adding that here for posterity's sake.

Also, if the element timers need to run on say the same thread 1, then the actual element also needs to be scheduled via StaticThreadSched and not just the pull to push converter like Unqueue.

Oct 21 '20 15:10 ahenning

I appreciate your help @ahenning.. Im playing with this now, but i realize that despite i ran ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread

The click did not built with multithread enabled.. and i m trying to see why

Oct 21 '20 16:10 p4pe

Ok now i think i got it.. Thank you @tbarbette..

I built click with ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread but when i run click -j config

I had this warning warning: Click was built without multithread support, running single threaded

Did you make clean then make again? Weird.

Oct 22 '20 10:10 tbarbette

I did a new installation in new machine, and now it is ok

Oct 23 '20 14:10 p4pe

Hello @tbarbette I m trying to pin elements to threads but i took this Error : router configuration specified twice

My click configuration file is:
FromDevice(enp4s0f1) -> Queue -> ToDevice(enp4s0f0); StaticThreadSched(FromDevice 1, ToDevice 0);

And I m running click with: click -a 2 -j 2 forwarder.click

Nov 09 '20 14:11 p4pe

Hi,

First, you need to create an instance of From/ToDevice as follows:

in:: FromDevice(enp4s0f1); out :: ToDevice(enp4s0f0);

Then describe your pipeline:

in -> Queue -> out;

Finally, pin each instance to the correct thread:

StaticThreadSched(in 1, out 0);

The way you did it, Click creates a different instance for every From/ToDevice call. This is why the output states that router configuration is specified twice.

Nov 09 '20 14:11 gkatsikas

Thank you @gkatsikas, but I have the same issue with this click configuration file:

in::FromDevice(enp4s0f1); out::ToDevice(enp4s0f0);

in -> Queue -> out;

StaticThreadSched(in 1, out 0);

Nov 09 '20 14:11 p4pe

I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)

Node1: Source.click

FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234

14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);

Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);

Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)

In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21

Whereas running it with click -a -j 2 the rate is 467960,45

What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?

Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally

How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets

Kindly thank you @tbarbette @ahenning and @gkatsikas for your input

Nov 13 '20 20:11 IoakeimFotoglou

I'd advise to keep "-a" empty, and play with the affinity inside Click. If you want to offset by two, just pin elements to thread 2 and 3. For the performance : without -a you leave the OS switching threads around and it's actually not very good at that.

WIth the forwarder using two cores, I'd expect the sink or source to become the bottleneck. But I'd advise using DPDK as soon as performance matters.

Nov 16 '20 14:11 tbarbette

Random advices:

FastUDPSource also takes a STOP argument that would kill Click when it finishes generating packets, so you're sure you're not taking the rate while forwarding no traffic.
That or use a LIMIT of -1.
Similarly, I'm more fond of AverageCounter instead of Counter that will take the rate between the first and last packet. In the FastClick branch it has "link_rate" that gives the rate in bps.
Check with htop what's the bottleneck on the 3 machines :)

Nov 16 '20 14:11 tbarbette

@IoakeimFotoglou we have the same issues i see. @tbarbette I will go too with your advices.

Every time I try to play with the affinity inside the Click I "took" this warning forwarder.click:6: While configuring ‘StaticThreadSched@4 :: StaticThreadSched’: warning: thread preference 2 out of range

I'm using the configuration that @gkatsikas proposed.

Thank you in advance

Nov 16 '20 16:11 p4pe

With configuration **in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);

in -> Queue -> out;

StaticThreadSched(in 0, out 1);**

and click -a -j 2 forwarder.click is working fine

With configuration

**in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);

in -> Queue -> out;

StaticThreadSched(in 2, out 3);**

and click -a j 2 forwarder.click

I have the issue i mentioned above.

Nov 16 '20 16:11 p4pe

Well, in the second configuration you explicitly ask for threads 2 and 3 in StaticThreadSched, but you call Click with only 2 cores (i.e., j 2 --> which implies that threads 0 and 1 will be allocated). If you bump j to 4 instead of 2 it should work.

Nov 16 '20 16:11 gkatsikas

Obviously I did not understand something correctly.

What I had understood so far is: If I have StaticThreadSched(in 0, out 0) means that i have two threads running in on core(core0)

If I have StaticThreadSched(in 0, out 1) this means that i ask for two threads (0, 1) and with -a -j 2 in the call, this configuration runs on the cores 0 and 1.

The "conflict" comes when i tried to run click in different cores (0,1). I configure StaticThreadSched(in 2, out 3) and i thought that these means that the click will run in cores 2 and 3.

Kindly thank you for your input and advices @gkatsikas

Nov 16 '20 17:11 p4pe

No, the thread index in StaticThreadSched does not necessarily correspond to a physical CPU core ID, it is simply a thread count. To have full control on how to pin those threads to a physical core, I suggest that you use the DPDK-based FastClick instead of Click.

Nov 16 '20 17:11 gkatsikas

Ok.. Thanks for the explanation. I want to use Click first.

So what do you suggest for better management of core pinning? Maybe the use "taskset" command ?

If I want 2 threads in one core i will have StaticThreadSched(in 0, out 0) Click -j 2 and then taskset -cp 2 PID or something like this

Nov 16 '20 17:11 p4pe

Your first two points were correct. I think what you missed is that a Click thread can run multiple elements. It's like user-level threads. So with

StaticThreadSched(in 0, out 0)

You pin the two elements to thread 0. As you pass -a, threads 0 means core 0. Core 1 is there but does nothing. Similarly you can pass -j 4 and assign thread 2 and 3 to the in and out elements. 0 and 1 will do nothing. It is not correct to assign elements to thread 2 and 3 if you launched click with 2 threads, as 3 is an out of bound index. That is the error you get.

Taskset will not work because if two elements are on the same thread there is nothing you can do about it.

For completeness, -a takes a parameter that allows to offset the assignation of threads to core. With -a 2, thread 0 will be pinned to core 2, while thread 1 to core 3. So in that case you would pin in and out to thread 0 and 1 which will be running on core 2 and 3.

What DPDK gives is the ability to further define a list of core so if you pass, 3,7,10 thread 0 would be pinned to core 3, 1 to 7 and 2 to 10.

My suggestion would be to run click with -a -j 16 if you have 16 cores and never think about this anymore. You pin elements to thread indexes that are exactly cores. if a core has nothing assigned to it then it won't run anything, you don't care really...

Nov 16 '20 22:11 tbarbette

Ok now I think that I get it.

If i want to run FromDevice and ToDevice in two different threads, and assign these threads to different cores that are on different sockets.

If we assume that core2 and core4 are on different socket. The configuration will be StaticThreadSched(in 2, out 4)

And with click -a -j 16 I will have what I want.

Thank you @tbarbette

Nov 17 '20 08:11 p4pe

You configuration should work even with -j 5. Note that pinning a FromDevice to socket 0 and ToDevice to socket 1 will imply inter-socket communication, which is costly in terms of performance. In your case it does not matter though, as you use the vanilla Click, which can hardly stress QPI.

Nov 17 '20 08:11 gkatsikas

I know @gkatsikas, this performance degradation I want to observe!

I have 4 different scenarios

the FromDevice and ToDevice running in the same core without hyperthreading
in the same core with ht 3)Different core in the same socket
Diffent core different socket.

If Im right the (4) scenario will have the worst performance(more ore less) due to the inter-socket communication

Nov 17 '20 08:11 p4pe

Yes, this is likely the case, although QPI effects may be obscured by some artefacts of your setup, such a mem copy from/to user-space. You may also try kernel-based Click or fast user-space Click (with DPDK) to eliminate this overhead.

Nov 17 '20 08:11 gkatsikas

Unfortunately I did not manage to install kernel-based Click(I think that is not compatible with new linux headers). Next step is to try Click with DPDK, but first I have to take a look at DPDK cause I am newbie.

Last question just for confirmation. For my first scenario I just have

in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0); in -> Queue -> out;

without StaticThreadSchead

And just run click forwarder.click

Nov 17 '20 08:11 p4pe

Yes (provided that you have disabled HT)

Nov 17 '20 08:11 gkatsikas

I'd say to always pin them, even for case 1.

Also you have to consider that without DPDK you're not pinning actually most of the RX work. Packets will be received by the kernel on probably all cores (the default for the NIC is to use as many queues as cores) through the interrupts handler, no matter how the application is pinned. They will go through the kernel stack on all those cores, this is some heavy work, before the app reads the packets from a single given core. Therefore if you really want to test QPI with the kernel sockets, you'll need to consider the number of queues (ethtool -L) and irq affinity.

Just a thought : similarly as your device is attached to a specific CPU, the packets will actually never be moved to the second core in the setup you present, just the Click metadata. You may want to "touch" the bytes on the second CPU. "CheckIPHeader -> SetTCP(or UDP)Checksum should do the trick.

Nov 17 '20 10:11 tbarbette

Thank you all for your help guys.

I (believe) that i manage to install the fastclick and know I will try to run the same "experiment" and see the difference.

if I understood well the only changes that I have to do is to replace ToDevice and FromDevice with ToDPDKDevice and FromDPDK device, with the interfaces that are binded with the DPDK, and after I run click with click --dpdk .

Dec 02 '20 10:12 p4pe

Mostly yes. You don't need a Queue also ;)

Dec 02 '20 14:12 tbarbette

I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)

Node1: Source.click

FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234

14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);

Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);

Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)

In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21

Whereas running it with click -a -j 2 the rate is 467960,45

What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?

Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally

How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets

Kindly thank you @tbarbette @ahenning and @gkatsikas for your input

Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource?

Jan 16 '21 08:01 Memtwo

Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) .

My topology is source--->VNF--->sink

Jan 16 '21 08:01 p4pe

Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) . My topology is source--->VNF--->sink Στις Σάβ, 16 Ιαν 2021, 10:32 π.μ. ο χρήστης Memtwo [email protected] έγραψε: … I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink) Node1: Source.click FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234 14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1); Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0); Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop) In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21 Whereas running it with click -a -j 2 the rate is 467960,45 What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate? Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets Kindly thank you @tbarbette https://github.com/tbarbette @ahenning https://github.com/ahenning and @gkatsikas https://github.com/gkatsikas for your input Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#467 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AINNSS7PY5EEQB64JON5ZY3S2FFJTANCNFSM4SYKBEOA .

Thanks! But what script runs on your node2(VNF)? Simple in -> Queue -> out ? How does the packet sent to node2 by node1, then sent to node3 by node 2? Sorry I'm newbie to click and those network problems.

Jan 16 '21 08:01 Memtwo

Yes just FD->Queue->TD.

You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.

Jan 16 '21 09:01 p4pe

Yes just FD->Queue->TD.

You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.

But it seems my packets are directly sent to node3 by node1 and ignore node2.

Jan 16 '21 10:01 Memtwo

If you run tcpdump on ingress port of node2, what did you take?

Jan 16 '21 10:01 p4pe

Oh use tcpdump can see the packet from node1 to node3 I think it works. Thank you very much!

Jan 16 '21 10:01 Memtwo

You are welcome! You can also use the IPPrint element, for checking.

Jan 16 '21 12:01 p4pe

sorry for trouble you after a long time. I am still newbie to Click and doubt my setup doesn't work well here's my setup: Node1 (source) --> Node2(forward) --> Node3 (sink) I want my packets are transmitted hop by hop

Three nodes' information is as follows: Node1 ens33:192.128.32.128 00:0c:29:92:68:92 ; ens38:192.168.32.129 00:0c:29:92:68:9c Node2 ens33:192.168.32.130 00:0c:29:57:e6:e1 ; ens38:192.168.32.131 00:0c:29:57:e6:eb Node3 ens33:192.168.32.132 00:0c:29:db:6e:56 ; ens38:192.168.32.133 00:0c:29:db:6e:60

Node1: Source.click

FastUDPSource(800000, -1, 60, 00:0c:29:92:68:92, 192.168.32.128, 1234, 00:0c:29:db:6e:56, 192.168.32.132, 1234) ->IPPrint("Hello") ->ToDevice(ens33);

Node2: Forward.click

FromDevice(ens38, PROMISC true) -> Queue -> IPPrint -> ToDevice(ens33);

Node3: Sink.click

FromDevice(ens33, PROMISC true) -> c:: Counter -> Discard; Script(wait 10, print c.rate, loop);

I can see the result on node3. I use tcpdump on node2 and see the packet 192.168.32.128.1234 > 192.168.32.132.1234, but use IPPrint element see nothing. How can I make sure that the packet are sent to node2 from node1, then sent to node3 from node2. Sorry again for bothering you.

Mar 03 '21 15:03 Memtwo

Hello, I think you have to rewrite the mac in every node that a packet arrives.

Mar 04 '21 14:03 p4pe

Thanks! Do you mean I should set node1's DST to node2, then on node2 use EtherRewrite element and set it's DST to node3?

Mar 05 '21 04:03 Memtwo

click click copied to clipboard

Assign element to cpu thread

click
click copied to clipboard