tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Enabling MRI Options header in every packet breaks the flow

Open sbibek opened this issue 5 years ago • 3 comments

Hi,

I want to embed the MRI options header in every packet that passes through the switch instead of using the probe packets. I am trying to do it the following way.

(h1)-----------------(s1)---------------(s2)-----------(h2)

if any incoming packets coming from h1 doesn't have MRI headers enabled, then it will enable those headers which is total of 4 byte (options header + mri header) and update the ihl, total length accordingly. Then the traffic is sent to h2 (destination). The most simplest code to do that


    action add_swtrace(switchID_t swid) { 
        if(!hdr.mri.isValid()){
            hdr.ipv4.ihl = hdr.ipv4.ihl + 1 ;
    	    hdr.ipv4.totalLen = hdr.ipv4.totalLen + 4;
        }
        hdr.mri.setValid();
        hdr.ipv4_option.setValid();

    hdr.mri.count = (bit<16>)0;
    hdr.ipv4_option.option = IPV4_OPTION_MRI;
    hdr.ipv4_option.optionLength = (bit<8>)4;
}

Only adding the options header and MRI header seems to break the applications such as iperf, ping and even sender-receiver probe in the MRI example. So, I created another method that drops those headers in the switch 2 before reaching the host 2.

action drop_mri(){
         hdr.mri.setInvalid();
         hdr.ipv4_option.setInvalid();
         hdr.ipv4.ihl = hdr.ipv4.ihl - 1 ;
    	 hdr.ipv4.totalLen = hdr.ipv4.totalLen - 4;
}

So the overall logic is as follows for the traffic originating from h1 to h2

  1. The traffic reaches s1, the MRI header is not set so add_swtrace sets the options and mri header as valid and updates the lengths appropriately.
  2. The traffic reaches s2, the add_swtrace execution will have no effect as the mri header is already valid. But in this case, the drop_mri is also executed which will invalidate the options and mri headers and then decrement the lengths.
  3. the traffic reaches h2 without options/mri fields

The result expected

Since the Options/MRI header is attached and detached within the switches and the end hosts never sees them, all kind of traffic should work fine and this implementation doesn't should not break any applications such as iperf, ping etc.

Result observed

  1. The ping works (h <-> h2)
  2. I removed the options field from the sender (sender.py) in the MRI and tested it out. The receiver receives the UDP packets. That means the sender-receiver application in MRI example is also working fine with the above change.
  3. iperf (tcp/udp) not working The initial handshake goes fine without any problem but then it stops working after certain time. I checked this behavior with wireshark and the iperf tcp stops working exactly at 24th packet almost all the time.

My reasoning

  1. Since I am modifying IPv4 header which is used by ping, UDP based applications. If those applications are working, then addition and dropping of options/mri header should be fine.
  2. But, the same thing is happening for other traffic (iperf, tcp etc), why is the flow breaking in this case? Is it because those application might be using options themselves? Or is there anything wrong in my approach?

What I am trying to do

I am trying this method to calculate the link latency at each hop. Each hop will update the MRI header and put their egress timestamp on it so that next hop can calculate the link latency by ( hop2.ingress_timestamp - hop1.egress_timestamp) and this will be updated in the register.

Is there any other alternatives to this approach of tagging information on every packets that traverses the switch?

Thanks

sbibek avatar Oct 22 '20 18:10 sbibek

The best approach I know for debugging such things is to try to find the first packet that is being sent from a host that is not getting through, and for that packet, the first switch where it is either being dropped, or sent out of a port where from that point onwards, it never reaches the destination you hope it does.

This can take time, but at least with the default settings of running these tutorials, there are pcap files recorded on every host and switch port for the packets that go over them, and there is a detailed bmv2 log file recorded for every switch that shows how the P4 program processed every received packet, step by step.

The process of narrowing down the first problematic packet can be made quicker if you can find a ping command that fails, that you think should succeed, that you run first after starting things up. If you run several successful pings before the failing one, then the pcap files and bmv2 log files contain a lot of information about packets that are probably working fine, that you would prefer not to wade through.

jafingerhut avatar Oct 23 '20 21:10 jafingerhut

Thanks. Got the issue, it's the issue with MSS crossing the limit. Is there any standard rules around what should be done or not if MSS increases beyond the limit when additional information is attached to traffic?

sbibek avatar Oct 28 '20 23:10 sbibek

If you mean a TCP MSS, then I think most people avoid modifying the TCP payload in most methods of tinkering with packets.

If you mean the interface's MTU, i.e. maximum supported packet length, then because IP fragmentation is typically going to kill performance in end hosts that must then do IP reassembly, a common thing for networks where a known number of bytes can be added to packets for various kinds of tunneling protocols is to make the MTU values on the hosts of the network X bytes lower than the network MTU, where X is the maximum amount you expect packets to be made longer in the network devices.

jafingerhut avatar Oct 29 '20 01:10 jafingerhut