tcpflow icon indicating copy to clipboard operation
tcpflow copied to clipboard

Unexpectedly large flow files generated

Open FusionFC opened this issue 10 years ago • 1 comments

When dealing with a pcap file that had multiple flows that shared the same IP/Port pairs communicating but with different MAC addresses unusually large flow files were generated (a 6.2K pcap generated a flow file with 1.2G in side).

In a different data set that simply had a standard port reuse example flow files were also generated of an unexpectedly large size (a 1.5M pcap generated multiple flow files of 2G size).

The issue appears to be in tcpdemux.cpp. It will properly detect a "new" flow due to the sequence number gap (delta will now be something large) and remove the flow and set tcp to 0 (probably should be NULL for consistency).

We are now in case 4 of the code and we will create a new flow. There is a comment that "delta will be 0" but in fact in case 4 the delta is not 0 but the large gap.

After that we eventually hit the tcp->store_packet with the large delta and thus we seek into the newly created file a large distance. I believe delta needs to be reset at the time the flow is removed to ensure that when we store the packet in this new flow we store it at the beginning of the file.

I'm currently testing with this change in place.

    /* flow is in the database; make sure the gap isn't too big.*/
    if(tcp){
        /* Compute delta based on next expected sequence number.
         * If delta will be too much, start a new flow.
         *
         * NOTE: I hope we don't get a packet from the old flow when
         * we are processing the new one. Perhaps we should be able to have
         * multiple flows at the same time with the same quad, and they are
         * at different window areas...
         * 
         */
        delta = seq - tcp->nsn;         // notice that signed offset is calculated

        if(abs(delta) > opt.max_seek){
            remove_flow(this_flow);
            delta = 0;
            tcp = 0;
        }
    }

    /* At this point, tcp may be NULL because:
     * case 1 - It's a new connection and SYN IS SET; normal case
     * case 2 - Extra packets on a now-closed connection
     * case 3 - Packets for which the initial part of the connection was missed
     * case 4 - It's a connecton that had a huge gap and was expired out of the databsae
     *
     * THIS IS THE ONLY PLACE THAT create_tcpip() is called.
     */

    /* q: what if syn is set AND there is data? */
    /* q: what if syn is set AND we already know about this connection? */

    if (tcp==NULL){

        /* Don't process if this is not a SYN and there is no data. */
        if(syn_set==false && tcp_datalen==0) return 0;

        /* Create a new connection.
         * delta will be 0, because it's a new connection!
         */
        be13::tcp_seq isn = syn_set ? seq : seq-1;
        tcp = create_tcpip(this_flow, isn, pi);
    }

    /* Now tcp is valid */
    tcp->myflow.tlast = pi.ts;          // most recently seen packet
    tcp->last_packet_number = packet_counter++;
    tcp->myflow.packet_count++;

    /*
     * 2012-10-24 slg - the first byte is sent at SEQ==ISN+1.
     * The first byte in POSIX files have an LSEEK of 0.
     * The original code overcame this issue by introducing an intentional off-by-one
     * error with the statement tcp->isn++.
     * 
     * With the new TCP state-machine we simply follow the spec.
     *
     * The new state machine works by examining the SYN and ACK packets
     * in accordance with the TCP spec.
     */
    if(syn_set){
        /* If the syn is set this is either a SYN or SYN-ACK. We use this information to set the direction
         * flag, but that's it. The direction flag is only used for coloring.
         */
        if(tcp->syn_count>1){
            DEBUG(2)("Multiple SYNs (%d) seen on connection %s",tcp->syn_count,tcp->flow_pathname.c_str());
        }
        tcp->syn_count++;
        if( !ack_set ){
            DEBUG(50) ("packet is handshake SYN"); /* First packet of three-way handshake */
            tcp->dir = tcpip::dir_cs;   // client->server
        } else {
            DEBUG(50) ("packet is handshake SYN/ACK"); /* second packet of three-way handshake  */
            tcp->dir = tcpip::dir_sc;   // server->client
        }
        if(tcp_datalen>0){
            tcp->violations++;
            DEBUG(1) ("TCP PROTOCOL VIOLATION: SYN with data! (length=%d)",(int)tcp_datalen);
        }
    }
    if(tcp_datalen==0) DEBUG(50) ("got TCP segment with no data"); // seems pointless to notify

    /* process any data.
     * Notice that this typically won't be called for the SYN or SYN/ACK,
     * since they both have no data by definition.
     */
    if (tcp_datalen>0){
        if (opt.console_output) {
            tcp->print_packet(tcp_data, tcp_datalen);
        } else {
            if (opt.store_output){
                tcp->store_packet(tcp_data, tcp_datalen, delta,pi.ts);
            }
        }
    }

FusionFC avatar Sep 08 '15 15:09 FusionFC

Thanks for your comments on this. I will appreciate any comments you have on the code.

On Sep 8, 2015, at 11:22 AM, FusionFC [email protected] wrote:

When dealing with a pcap file that had multiple flows that shared the same IP/Port pairs communicating but with different MAC addresses unusually large flow files were generated (a 6.2K pcap generated a flow file with 1.2G in side).

In a different data set that simply had a standard port reuse example flow files were also generated of an unexpectedly large size (a 1.5M pcap generated multiple flow files of 2G size).

The issue appears to be in tcpdemux.cpp. It will properly detect a "new" flow due to the sequence number gap (delta will now be something large) and remove the flow and set tcp to 0 (probably should be NULL for consistency).

We are now in case 4 of the code and we will create a new flow. There is a comment that "delta will be 0" but in fact in case 4 the delta is not 0 but the large gap.

After that we eventually hit the tcp->store_packet with the large delta and thus we seek into the newly created file a large distance. I believe delta needs to be reset at the time the flow is removed to ensure that when we store the packet in this new flow we store it at the beginning of the file.

I'm currently testing with this change in place.

/* flow is in the database; make sure the gap isn't too big.*/
if(tcp){
    /* Compute delta based on next expected sequence number.
     * If delta will be too much, start a new flow.
     *
     * NOTE: I hope we don't get a packet from the old flow when
     * we are processing the new one. Perhaps we should be able to have
     * multiple flows at the same time with the same quad, and they are
     * at different window areas...
     * 
     */
    delta = seq - tcp->nsn;         // notice that signed offset is calculated

    if(abs(delta) > opt.max_seek){
        remove_flow(this_flow);
        delta = 0;
        tcp = 0;
    }
}

/* At this point, tcp may be NULL because:
 * case 1 - It's a new connection and SYN IS SET; normal case
 * case 2 - Extra packets on a now-closed connection
 * case 3 - Packets for which the initial part of the connection was missed
 * case 4 - It's a connecton that had a huge gap and was expired out of the databsae
 *
 * THIS IS THE ONLY PLACE THAT create_tcpip() is called.
 */

/* q: what if syn is set AND there is data? */
/* q: what if syn is set AND we already know about this connection? */

if (tcp==NULL){

    /* Don't process if this is not a SYN and there is no data. */
    if(syn_set==false && tcp_datalen==0) return 0;

    /* Create a new connection.
     * delta will be 0, because it's a new connection!
     */
    be13::tcp_seq isn = syn_set ? seq : seq-1;
    tcp = create_tcpip(this_flow, isn, pi);
}

/* Now tcp is valid */
tcp->myflow.tlast = pi.ts;          // most recently seen packet
tcp->last_packet_number = packet_counter++;
tcp->myflow.packet_count++;

/*
 * 2012-10-24 slg - the first byte is sent at SEQ==ISN+1.
 * The first byte in POSIX files have an LSEEK of 0.
 * The original code overcame this issue by introducing an intentional off-by-one
 * error with the statement tcp->isn++.
 * 
 * With the new TCP state-machine we simply follow the spec.
 *
 * The new state machine works by examining the SYN and ACK packets
 * in accordance with the TCP spec.
 */
if(syn_set){
    /* If the syn is set this is either a SYN or SYN-ACK. We use this information to set the direction
     * flag, but that's it. The direction flag is only used for coloring.
     */
    if(tcp->syn_count>1){
        DEBUG(2)("Multiple SYNs (%d) seen on connection %s",tcp->syn_count,tcp->flow_pathname.c_str());
    }
    tcp->syn_count++;
    if( !ack_set ){
        DEBUG(50) ("packet is handshake SYN"); /* First packet of three-way handshake */
        tcp->dir = tcpip::dir_cs;   // client->server
    } else {
        DEBUG(50) ("packet is handshake SYN/ACK"); /* second packet of three-way handshake  */
        tcp->dir = tcpip::dir_sc;   // server->client
    }
    if(tcp_datalen>0){
        tcp->violations++;
        DEBUG(1) ("TCP PROTOCOL VIOLATION: SYN with data! (length=%d)",(int)tcp_datalen);
    }
}
if(tcp_datalen==0) DEBUG(50) ("got TCP segment with no data"); // seems pointless to notify

/* process any data.
 * Notice that this typically won't be called for the SYN or SYN/ACK,
 * since they both have no data by definition.
 */
if (tcp_datalen>0){
    if (opt.console_output) {
        tcp->print_packet(tcp_data, tcp_datalen);
    } else {
        if (opt.store_output){
            tcp->store_packet(tcp_data, tcp_datalen, delta,pi.ts);
        }
    }
}

— Reply to this email directly or view it on GitHub https://github.com/simsong/tcpflow/issues/106.

simsong avatar Sep 09 '15 01:09 simsong