ns3-rdma Some questions about how modules work in simlator.

Hi. I have run the main.exe correctly and now i want to implement some algorithm on simulator. Can i know how the simulator work , like the relationship of qbb-device and broadcom-node, and the role of qbb-device in simulation. I also want to know where and who decides when to send PFC packet. Thank you for your reply

Nov 01 '16 02:11 btlcmr0702

Each qbb-net-device is a L2 device, which can be a switch OR a NIC. If it's switch, it will have a broadcom-node (m_broadcom) and a broadcom queue (m_queue) attached to it. (This is not accurate, see below)

If it's a switch, upon receiving packets, it will ask m_broadcom whether there is still space for this new packet. If so, it pushes this packet into m_queue and asks m_braodcom whether the PFC threshold is met. If so, it will send a PFC PAUSE to upstream.

Upon sending packets, it gets a packet from m_queue, and ask m_broadcom whether the queue length falls below PFC threshold. If so, it will send a PFC RESUME.

If it's a NIC, it will perform DCQCN rate control.

You can find most of these details in QbbNetDevice::Receive and QbbNetDevice::DequeueAndTransmit

Nov 01 '16 02:11 bobzhuyb

Oh, so it's very different from the switch mechanism in NS3 like bridge module , and every port of switch is also abstract , just counts how many bytes the port recveives？ OK,but i find the pause_time parameter , so the upstream resumes sending packets depends on resume message form donwstream or pause_time , maybe they work together i mean the pause time is the last barrier for droping packet? And some terms like sp , pg ,rpr ,what's mean of them ? SP for strict priority ? pg for priority guarantee ? Thank u very much :)

Nov 01 '16 03:11 btlcmr0702

Sorry, I think I made a small mistake above. qbb-net-device is ether a NIC port or a switch port. Please bare with me... wrote these codes long ago. broadcom-node and broadcom queue is attached to the node, not the qbb-net-device. A node can have multiple qbb-net-device (especially on a switch), which share the same m_broadcom and m_queue.

pg is Priority Group, some people call it priority class, or simply "priority". sp is Service Pool. It's a shared buffer that multiple pg can share. These are the terms from Broadcom.

rpr.. stuff is from QCN standard. You can refer to http://www.cs.ucr.edu/~mart/204/802-1au-d2-4.pdf (page 96, figure 32-2)

Nov 01 '16 06:11 bobzhuyb

Thanks first. I study the code but i find i am confused. I see the QbbNetDevice::Receive but i can't find where to ask m_broadcom about checking space situation ,while i found these in QbbNetDevice::send() but it seems unreasonable because it should in Receive() and i don't know who and where invoke send( ) . And i eager to knowing how and where the PFC was triggered. It seems very late in the USA , have a good night.

Nov 01 '16 07:11 btlcmr0702

You are right.. they are in send(), because send() is where the packets are put into the sending queue (MMU) and ready to be sent to the next hop. This is invoked by the upper layer.

The name "send" is inherited from point-to-point-device. The reason it's called "send", is because the packet is actually leaving this qbb-net-device (the ingress port). On a switch, the packet is heading towards another qbb-net-device (the egress port on the same switch). For this purpose, it must be buffered in the broadcom queue module, and waits for the egress qbb-net-device to invoke dequeueandtransmit() and get this packet.

dequeueandtransmit() is where the packet actually leaves the switch.

Nov 01 '16 07:11 bobzhuyb

what you say the upper layer who invoke send() is refer to broadcom-node or broadcom queue? But i can't find where to invoke the send(). For implementing my algorithm , i eager to knowing where the PFC was triggered , how the PFC mechanism works?

Nov 01 '16 08:11 btlcmr0702

PFC generation is in QbbNetDevice::CheckQueueFull.

QbbNetDevice::Receive handles received PFC.

The implemented PFC mechanism is just standard. Once a port is paused, it either waits for a RESUME, or waits for PFC pause timeout.

The send() is invoked by common NS-3 pipeline. Nothing special either. It's just overriding its parent class method PointToPointNetDevice::Send()

Nov 01 '16 09:11 bobzhuyb

yeah, i found the BroadcomNode::GetPauseClasses decides whether a priority queue should send PFC. And i also notice the dynamtic_pfc_threshold , so what's the difference between dynamtic and common pfc? what's the mean of parameter m_pg_shared_alpha_cell in dynamtic situation ? Can i know the mean of these codes?

Nov 01 '16 10:11 btlcmr0702

It's dynamic PFC threshold from Broadcom. In DCQCN paper http://yibozhu.com/doc/dcqcn-sigcomm15.pdf, "The Trident II chipset in our switch allows us to configure a parameter β such that"

The β is the m_pg_shared_alpha_cell. The paper calls it β because DCQCN already has an \alpha. Broadcom calls this value alpha.

If you don't understand it and don't need it, you may disable dynamic threshold in config.txt

Nov 01 '16 16:11 bobzhuyb

The first picture you show is in udp client. It is pushing packets to the lower layer (and be buffered there), instead of actually transmitting the packet out at layer 2. Just think of what happens in real OS, when you call a socket send(), the packet is not immediately sent out. It is buffered in local OS waiting for the NIC to actually handle it.

Because the simulator is testing full throughput, we keep the buffer at end host non-empty. As a result, the NIC at layer 2 will just transmit the packets at line rate, regardless the timing that UDP client sends these packets. I add random interval here just to help the case where multiple UDP flows start from the same end host.

If you want to test applications that randomly send out a few packets, you need to edit application layer. If you want to test when NIC or switch misbehaves and makes the packet interval random, you need to edit qbb-net-device (layer 2).

Pick the right layer to work on... Keep in mind that when you see a send(), it just means sending from THIS layer to another layer (unless you are at layer 1/2). It does not mean sent out from a device.

Nov 02 '16 02:11 bobzhuyb

hi,it's nice of you to answer me every time. Now , i want to realize the experiment you do in the DCQCN paper(2015 sigcomm), except the topology and dataRate i need set in config.txt , any other parameters i need set ? such as the parameters in broadcom-node.h , there are many threshold parameters.

Nov 02 '16 11:11 btlcmr0702

You don't need to modify anything in the code. Just edit config files.

Nov 02 '16 21:11 bobzhuyb

OK , i have try yo midify the flow.txt ,topology.txt trace.txt. can i know the meaning of output in mix.tr for example this is the line in mix.tr of the default config 2.000002 /1 1.2>1.1 u 32795 0 3 what's mean of every segment? And by the way , where to modify the output content above ?

Nov 03 '16 07:11 btlcmr0702

hi i want to know where to set the parameter "feedback_delay"? Is the NP_SAMPLING_INTERVAL ? or what does NP_SAMPLING_INTERVAL use for ?

Nov 03 '16 13:11 btlcmr0702

what is "feedback_delay"? You can add link latency in topology.txt. If you want the NIC to delay sending ACKs, you can take a look at qbb-net-device, edit the place where NIC generates ACK.

NP_SAMPING_INTERVAL was for modeling older (and weaker) NICs. Sometimes they cannot capture all ECN marks. For example, when they capture one ECN, they have to spend some time processing it and cannot capture another ECN within some interval. New NICs do not have this limitation. So just keep it 0.

Nov 04 '16 03:11 bobzhuyb

The trace format: timestamp, node_being_traced, src_ip>dst_ip, u=udp, port#, sequence#, priority

The code for trace output is scattered in different xxx-header.cc files. For example, at IP layer, you can find the src_ip>dst_ip part here:

https://github.com/bobzhuyb/ns3-rdma/blob/master/src/internet/model/ipv4-header.cc

in Ipv4Header::Print() method

If you configure a node to be traced in trace.txt, NS-3 automatically calls all headers' Print() function on every packet on this node.

Nov 04 '16 04:11 bobzhuyb

hi i want get the latency of every packet , so where i should work on and how to get the timestamp of packet ? i have get the packet_id and timestamp from header seqTs , but i don't know the format of the timestamp , it seems that it is Timestep class , how to transform it to the comparable value with Simulator::now().

Nov 08 '16 13:11 btlcmr0702

TimeStep is NS-3's data structure. You should not ask it here. Search online... Or use Visual Studio's "go to definition" or "find all references"

Nov 08 '16 16:11 bobzhuyb

hi i found the this value m_pg_shared_alpha_cell((double)m_buffer_cell_limit_sp - m_usedIngressSPBytes[GetIngressSP(port, qIndex)]* will become negative when i have 10 40Gb flows and set QCN 0 , dynamic PFC 1.Why it become negative? So can i have some instruction or manual about the switch configuration? I don't know the actual meaning of these variables.

Nov 10 '16 11:11 btlcmr0702

total buffer = guaranteed buffer + shared buffer + headroom buffer (you can search "PFC headroom" to learn more about it)

buffer_cell_limit_sp is the threshold for guaranteed + shared buffer. When the buffer is very full, and some headroom is used, this will become negative.

I am sorry that I cannot send you Broadcom's confidential documents. You could directly ask them for the document of the chipset you want.

Nov 10 '16 14:11 bobzhuyb

hi I notice there are different mechanisms for packet dequeue like dequeueNIC ,dequeueQCN,dequeueRR Can i know the difference and how they work together? Thank you!

Nov 14 '16 05:11 btlcmr0702

Check qbb-net-device.cc, where these functions are called.

Some of them are for NIC, some of them are for switches. Some of them use round robin (i.e., RR) to decide which priority should send the packet, some of them use strict priority.

Nov 14 '16 05:11 bobzhuyb

hi I want to know when the packet enter and depart the switch , so where i should print the time ? Thank you

Nov 22 '16 13:11 btlcmr0702

Check the NS-3 tracing methods, like m_phyTxEndTrace, m_snifferTrace, ... etc. in qbb-net-device.cc

Nov 23 '16 22:11 bobzhuyb

hi I met some strange problems I thought the delay between when packet was generated and when packet entered the switch equals to the link_delay i set in configuration file(only one hop), but i found it is not. Then i try to find out the reason , but i can't understand some codes in function send() in file udp-client.cc , : Why the send_interval vary each time , is this for adjusting the speed? And what does parameter buffer mean？ Thank you!

Dec 05 '16 13:12 btlcmr0702

This piece of the code is controlling how the packets enter the udp sender's local buffer. Since this buffer usage can change, the time it takes for a packet to reach the first switch can vary.

Please check again this answer: https://github.com/bobzhuyb/ns3-rdma/issues/3#issuecomment-257757728

Dec 05 '16 18:12 bobzhuyb

Thank you for your patient reply :) Now, for doing some more experiments ， i need to run the simulation on Linux. So how can i do this , can I just copy the src codes to the original NS3 code on Linux and compile ?

Dec 07 '16 12:12 btlcmr0702

You need to edit the wscript file of each module, e.g., https://github.com/bobzhuyb/ns3-rdma/blob/master/src/point-to-point/wscript

There may also be slight differences between gcc and vc++. But it should not be hard to fix the code.

The most convenient way is actually to use WINE and run exe binary directly. WINE 1.6.2 (you can install it using apt-get in Ubuntu 16.04) should work just fine. Put the binary and config files in the same folder, and run:

wine64 main.exe config.txt

Dec 07 '16 18:12 bobzhuyb

oh, do you mean i still use the VS to fix and compile the source coed, and then i will get main.exe every time .Then i copy the .exe file and config file to Linux and use WINE to run ? I will try , thank you

Dec 08 '16 04:12 btlcmr0702

Yes.

Dec 08 '16 05:12 bobzhuyb

ns3-rdma ns3-rdma copied to clipboard

Some questions about how modules work in simlator.

ns3-rdma
ns3-rdma copied to clipboard