HomaSimulation The structure of ns2 simulation code?

Hi behnamm, I wonder the structure of your ns2 simulation code, can you give me a README? And I cannot find out the source code solving the priority, can you give me a path? Thanks a lot!

Sep 10 '18 13:09 liulalala

Hi @liulalala Sorry for the very late reply. The ns2 code is only used for PIAS and pFabric comparisons. Homa is not simulated in ns2, only in omnet++.

If you are interested in pFabric or PIAS simulations, you need to check out ns2_pfabric or pias branches. These branches populate the code for those simulations and fill RpcTransportDesign/ns2_Simulations/scripts/ directory with simulation scripts.

I'll update the README to contain these info.

Cheers,

Oct 02 '18 17:10 behnamm

Thanks a lot for replying me! I am interested in your Homa project hhh! I looked into omnet++ code, and 1. I cannot figure out what does cbf mean? (like getRemainSizeCdfCbf, getCbfFromCdf) 2. what does defaultReqBytes mean? That indicates that the sender sends request packets first(containing number of requested packets)? But in paper it says: "an initial unscheduled portion, followed by a scheduled portion".(in 3.2) And no requested portion is involved. Looking forward for your reply! Thanks :)

Oct 08 '18 08:10 liulalala

Hi behnamm, sorry to disturb you again. I'm trying to run the homa code. I follow the README to set up, but I wonder what's the default input file? (./homatransport xxx.ini)? And I wonder the structure of the homa code. Looking forward for your reply!

Oct 17 '18 14:10 liulalala

@liulalala First, make sure you know enough about omnet++ and how to configure and run simulations from command line (refer to oment++ manual on omnet++ website). Then, in order to run Homa, after you build the simulation package, you need to go to RpcTransportDesign/OMNeT++Simulation/homatransport/src/dcntopo folder and run your simulation scenario from there. Here is an example on how to run a single configurations:

../homatransport -u Cmdenv -c WorkloadHadoop -r 6 -n ..:../../simulations:../../../inet/examples:../../../inet/src -l ../../../inet/src/INET homaTransportConfig.ini

"-u Cmdenv" tells OMNeT++ not to run the simulation in the gui. homaTransportConfig.ini at the end of the command is the configuration file we use and "-c WorkoaldHadoop" asks omnet to use parameters specified in WorkloadHadoop section of the config file. -r 6 specifies run number 6 withing that section to be simulated.

Oct 17 '18 18:10 behnamm

Thanks a lot for replying me. I wonder that whether the receiver will send grants for each unscheduled packets and request packets? Or only the last unscheduled packets?

Oct 29 '18 09:10 liulalala

Grant packets are transmitted one packet at a time, for every single data packet that arrives. So, for each unscheduled packet (including the request packet) that arrives at the receiver, a new grant packet is sent. However, grants are only sent for a message if the message belongs to the high priority set of messages that the receiver is actively granting. Please read the paper for more information.

Cheers,

Oct 29 '18 16:10 behnamm

Thanks a lot for your reply! And I am sorry to bother you again... I am reading the paper and code carefully, but I still cannot figure out some detail.

In HomaTransport::initialize(), uint32_t maxDataBytes = MAX_ETHERNET_PAYLOAD_BYTES - IP_HEADER_SIZE - UDP_HEADER_SIZE - dataPkt.headerSize(); if (homaConfig->grantMaxBytes > maxDataBytes) { homaConfig->grantMaxBytes = maxDataBytes; }, that means a grant will grant at most MTU bytes according to the meaning of grantMaxBytes.( And I cannot find the value of maxDataBytes in homaTransportConfig.ini). But in paper section 3.3 Flowcontrol, it says that, the offset is chosen so that there are always RTTbytes of data in the message that have been granted but not yet received. Seems that it means a grant can grant for multiple data packets?
In HomaTransport::ReceiveScheduler::processReceivedPkt(), oversubscription period is mentioned, means that you can open or close oversubscription due to network situation? Could you explain it in detail? Looking forward for your reply :)

Nov 01 '18 07:11 liulalala

@liulalala Happy to help. Find the responses inline.

Thanks a lot for your reply! And I am sorry to bother you again... I am reading the paper and code carefully, but I still cannot figure out some detail.

In HomaTransport::initialize(), uint32_t maxDataBytes = MAX_ETHERNET_PAYLOAD_BYTES - IP_HEADER_SIZE - UDP_HEADER_SIZE - dataPkt.headerSize(); if (homaConfig->grantMaxBytes > maxDataBytes) { homaConfig->grantMaxBytes = maxDataBytes; }, that means a grant will grant at most MTU bytes according to the meaning of grantMaxBytes.( And I cannot find the value of maxDataBytes in homaTransportConfig.ini). But in paper section 3.3 Flowcontrol, it says that, the offset is chosen so that there are always RTTbytes of data in the message that have been granted but not yet received. Seems that it means a grant can grant for multiple data packets?

Note that a grant may allow transmission of multiple data packets, but that doesn't mean we don't send grants on a per packet basis. As I said before, grants are sent on a per packet basis and in the common case when grants are not delayed, we expect that a new scheduled packet is sent for every new grant packet that arrives at the sender. What you are referring to in the paper is an optimization for when two grant packets G1 and G2 are reordered in the network and the later grant G2 arrives earlier than G1 at the sender. To compensate for the reordering of the grants, with arrival of G2, we allow transmission of two scheduled packets instead of one. The offset you refer to is a way to implement this effect. That said, while we have implemented this optimization in the RAMCloud implementation, we didn't implement this in the simulations. So in the simulations, we can only transmit one scheduled packet for every new grant.

In HomaTransport::ReceiveScheduler::processReceivedPkt(), oversubscription period is mentioned, means that you can open or close oversubscription due to network situation? Could you explain it in detail?

This is a feature I added for collecting statistics and computing the wasted bandwidth. It doesn't have any effect on the algorithm and Homa mechanisms. You don't need to worry about this.

Looking forward for your reply :)

Nov 01 '18 23:11 behnamm

Got it! Thank you so much! Another question: how to determine the priority of the scheduled packets? As far as I am concerned, First, to maintain a candidate list, length of overcommitment level, which contains the flows having the shortest remaining bytes to receive. Each time receiving a data packet, update it. Then, every time receiving a data packet, check whether the head of the flows(of highest priority) has grant to send (the bytes on wire is smaller than RTTbytes), if not, check the second one, and so on... If no flow in the candidate list can send a grant, this data packet will not trigger any grant. I am not sure whether my understanding is right. And I wonder how to decide the priority of scheduled packets, in another word, the prio field in grant's head. For example, if we send a grant whose list number is sId, the prio field will be set to sId +# unscheduled priority? Then What does always use the lowest scheduled priorities means? And In HomaTransport::ReceiveScheduler::SenderState::sendAndScheduleGrant, there is grantPrio = std::min(grantPrio, (uint32_t)resolverPrio); , this may result in scheduled packets use the priority of unscheduled packets?

Nov 04 '18 14:11 liulalala

@liulalala responses inlined

Got it! Thank you so much! Another question: how to determine the priority of the scheduled packets? As far as I am concerned, First, to maintain a candidate list, length of overcommitment level, which contains the flows having the shortest remaining bytes to receive. Each time receiving a data packet, update it. Then, every time receiving a data packet, check whether the head of the flows(of highest priority) has grant to send (the bytes on wire is smaller than RTTbytes), if not, check the second one, and so on... If no flow in the candidate list can send a grant, this data packet will not trigger any grant.

That should work. Although, this is not exactly how I have implemented in the simulator. The simulator also send grants based on a timer: when one packet time is passed, we check if we can send a grant for any of the active messages, subject to conditions like if the message is among the top scheduled message and there is less than on RTTBytes outstanding bytes for that message. The simulations code has more than what we have discussed in the paper that may make the code difficult to understand. I would suggest look at the RAMCloud implementation of Homa for a cleaner code.

I am not sure whether my understanding is right. And I wonder how to decide the priority of scheduled packets, in another word, the prio field in grant's head. For example, if we send a grant whose list number is sId, the prio field will be set to sId +# unscheduled priority? Then What does always use the lowest scheduled priorities means?

It basically means that as the top priority message in the list completes, you push the remaining messages up in the list. That means a new place in the list opens up at the lowest priority level so if a new message arrives, it would be inserted to the list at the lowest priority level. Section 3.4 of the paper explain this.

And In HomaTransport::ReceiveScheduler::SenderState::sendAndScheduleGrant, there is grantPrio = std::min(grantPrio, (uint32_t)resolverPrio); , this may result in scheduled packets use the priority of unscheduled packets?

This relates to an optimization that may not have been explained in the paper. Basically, because of this optimization, the last RTTBytes of the scheduled messages gets an unscheduled priority level. That makes sense because from the receiver perspective that is doing SRPT, the last RTTbytes of a message is as important as the first RTTBytes of the message. So, we assign unscheduled priority level for the last RTTBytes of scheduled portion. Hope this makes sense.

Nov 05 '18 22:11 behnamm

Thank you so much! Sorry to bother you again. Do you mean that for scheduled packets that do not belong to the last RTT, their priority is calculated by sId +# unscheduled priority, and for the last RTTbytes, their priority is calculated by grantPrio = std::min( sId +# unscheduled priority, (uint32_t)resolverPrio)? Another detailed question, how the W1-W5 workload in paper correspond to workload file? As there are so many workload files in folder sizeDistributions such as FABRICATED_HEAVY_MIDDLE.txt, FABRICATED_HEAVY_HEAD.txt. And number of flows in one simulation?

Nov 07 '18 03:11 liulalala

Thank you so much! Sorry to bother you again. Do you mean that for scheduled packets that do not belong to the last RTT, their priority is calculated by sId +# unscheduled priority, and for the last RTTbytes, their priority is calculated by grantPrio = std::min( sId +# unscheduled priority, (uint32_t)resolverPrio)?

Yes, That's correct.

Another detailed question, how the W1-W5 workload in paper correspond to workload file? As there are so many workload files in folder sizeDistributions such as FABRICATED_HEAVY_MIDDLE.txt, FABRICATED_HEAVY_HEAD.txt. And number of flows in one simulation?

W1 -> FacebookKeyValueMsgSizeDist.txt W2 -> Google_SearchRPC.txt W3 -> Google_AllRPC.txt W4 -> Facebook_HadoopDist_All.txt W5 -> DCTCP_MsgSizeDist.txt To save space in the paper, the results for the rest of the workloads in that folder were not reported in paper.

Nov 07 '18 04:11 behnamm

Thanks! Small few questions that I got confused:

the workload files present the size of flows in bytes or in packet number? I guess in bytes? But for W5, the cdf file (DCTCP_MsgSizeDist.txt) is : According to the paper, seems that the flows in DCTCP is large, then the size in DCTCP_MsgSizeDist.txt seems to represent the flow size in packet number?...
the priority number is 8 for custom switch queue, and 1 highest for signals like grant and request, another 7 priority is left for unscheduled and scheduled packets, right? But in the DCTCP flow pattern, adaptiveSchedPrioLevels = 7, that means that unscheduled and scheduled flows will use one same priority? As we discussed before, it seems make sense.

Nov 07 '18 08:11 liulalala

Thanks! Small few questions that I got confused:

the workload files present the size of flows in bytes or in packet number? I guess in bytes? But for W5, the cdf file (DCTCP_MsgSizeDist.txt) is : According to the paper, seems that the flows in DCTCP is large, then the size in DCTCP_MsgSizeDist.txt seems to represent the flow size in packet number?...

Correct! The original DCTCP search workload from DCTCP paper is specified in terms of packet counts rather than bytes. That's why, the file for this workload is also in terms of packet counts but the simulator takes care of transforming the workload from packets to bytes.

the priority number is 8 for custom switch queue, and 1 highest for signals like grant and request, another 7 priority is left for unscheduled and scheduled packets, right? But in the DCTCP flow pattern, adaptiveSchedPrioLevels = 7, that means that unscheduled and scheduled flows will use one same priority? As we discussed before, it seems make sense.

No, there's no distinct priority reserved for grants. The grants share the highest priority level with some of the highest priority unscheduled packets that belong to smallest of messages. The load that grant packets are putting on the network is taken into account when dividing the priorities among the unscheduled and scheduled packets. For example, in DCTCP workload, the unscheduled packets and grants and the last RTTBytes of scheduled packets all share the single highest priority level from 8 total priority levels available. The remaining 7 priority levels are all used for scheduled packets (ie. adaptiveSchedPrioLevels=7). Hope this makes things clear.

Nov 08 '18 00:11 behnamm

Thanks! Another question, what does the 99% slowdown for each x-axis means?

Nov 12 '18 07:11 liulalala

As far as I'm concerned, I think that the 99% slowdown is: first sort the flows in an ascending order, choose the 99% flow's completion time / it's oracle completion time. But I cannot figure out what does the 99% slowdown means for each particular flow size (as the x-axis shows: 2 3 5 11...)

Nov 18 '18 05:11 liulalala

So, imagive we run the experiment at a specific load factor (eg. 80% load factor) for a long enough time such that for every single message size in the workload, we have generated 1000s of instances of that size and found the latency for each instance of that message size. Now we sort the latencies for that message size and find the 99%ile and minimum latency among them. Divide the 99%ile latency over the minimum latency and you have the 99%ile slowdown for that message size. This was explained in the paper, but if that wasn't clear, hope this makes it clear.

Dec 14 '18 19:12 behnamm

HomaSimulation HomaSimulation copied to clipboard

The structure of ns2 simulation code?

HomaSimulation
HomaSimulation copied to clipboard