PcapPlusPlus icon indicating copy to clipboard operation
PcapPlusPlus copied to clipboard

Create a TCP reordering module

Open seladb opened this issue 7 years ago • 11 comments

In some cases, TCP packets get scrambled on the network and don't arrive to the other side in the same order that they were sent. The purpose of this module is to re-order the in the same way they were sent, meaning according to the TCP stream order. This sounds a little like TCP reassembly, but the difference is that this module should only re-order packets, nothing more. It should be simpler and more lightweight than TCP reassembly

seladb avatar Mar 22 '18 22:03 seladb

I have checked #82 , so is there any progress?

wtdcode avatar Feb 21 '19 02:02 wtdcode

I still haven't got to it. But if you can find time to build it it'd be great

seladb avatar Feb 21 '19 06:02 seladb

Hi, @seladb ! I have thought about this task and want to share some thoughts with You. The general reordering module must solve the next problem:

Given some data G represented as ordered sequence S(K) = {{p0, g0}, ..., {pK, gK}} of parts {pi, gi} with distinct incrementing part numbers pi (as pi < pi+1 in mathematical sense) assigneg to each one let the S'(K) be the permutation of original set S(K) and the s(N) the subset of S'(K) with cardinality of N <= K that keeps the relative order of elements of S'(K) and the q(M) the subset of s(N) with cardinality of M <= N that do not keeps relative order of elements of s(N) the general reorderer(GR) must while "being fed" with sequence of parts selected randomly from s(N) or q(M) (with the respect of relative element ordering in each set -> "as-if" the currrent pointer is assigned to each set and which one is advanced and dereferenced is chosen randomly) reconstruct the original ordered set S(K) (stream all known parts in the same order as in S(K)) as soon as it will be possible and report any failures to do so.

This representation of the GR will suffer from HOL(Head-Of-the-Line blocking) problem. My questions:

  1. Do I correctly understand the purpose and definition of GR facility?
  2. The HOL problem is the general problem for any streaming channels (like TCP). In the TCP it is soleved by retransmissions but We are working with already finished (probably long time ago sessions) so I am afraid that perfect recreation of stream or the reasanoble reacreation of it without explicit requests (inform GR about session end) or fixed threshold buffering (flush after buffer of some size is filled with more than threshold value of entries) is not possibli. What are Your thoughts about it? What kind of the HLD (high-level-design) should GR follow?

echo-Mike avatar Sep 21 '19 20:09 echo-Mike

In my terms the GR sees the sequence r(N+M) that is made as so (C++ pseudocode):

struct range
{
    std::size_t size;
    void* data;
};

struct data_part
{
    std::uint32_t tcp_seq;
    range data;
};

struct data_seq
{
    std::vector<data_part> parts;
    std::vector<data_part>::iterator current;
};

using segment = std::optional<std::vector<range>>;

data_part& r_generator(data_seq& s, data_seq& q)
{
    bool s_ended = s.current == s.parts.end();
    bool q_ended = q.current == q.parts.end();
    if (s_ended && q_ended)
        throw 1;
    if (s_ended)
        return *q.current++;
    if (q_ended)
        return *s.current++;
    return rnd() % 2 ? *s.current++ : *q.current++;
}

segment GR::reassemble(const data_part& new_part)
{
    /* magic here */
}

void reassembly_routine(callback_f user_callback)
{
    data_seq s, q;
    GR gr;
    segment seg;
    for(;;)
    {
        try {
            seg = gr.reassemble(r_generator(s,q));
        } catch (int) { return; }
        if (!seg.get())
            continue;
        user_callback(*seg.get())
    }
}

echo-Mike avatar Sep 21 '19 20:09 echo-Mike

The sequence r(N+M) represents the TCP packet stream captured from Network or read from file or other source

echo-Mike avatar Sep 21 '19 20:09 echo-Mike

@echo-Mike thanks for thorough explanation. I am not familiar with the GR problem you described so it's hard for me to comment about it. I'm also not familiar with the HOL problem.

I think that TCP reordering is very similar to TCP reassembly, with one big difference that in TCP reordering you don't need to recreate the stream, you just need to rearrange the packets in the right order which should save CPU and memory consumption.

Please let me know what you think.

seladb avatar Sep 22 '19 07:09 seladb

Oh, I see now. I have thought that the TCP reorderer should be similar to TCP reassembly but for one stream only. But the description You suggest makes me ask other questoins:

  1. Should this potential module work within some burst of packets (so effectively it can be stateless) or across multiple bursts within one session (this will require it to be stateful)?
  2. Are there any requirements for "spitting out" reorderred packets (a.e. the output stream of reorderer if we accept that the input sequence of packets is it's input stream)? How You see it?
  3. Should it be the generic module (for any protocol based on packet unique numberring, including retransmissions) or TCP specific (more information can be acquired from SYN/FIN/RST flags, ACK number and SACK option, empty ACK packets may be ignored as well as keep-alive packets)?
  4. Should it work for both sides of the connection or only for one?

Please @seladb share Your view on this questions.

echo-Mike avatar Sep 22 '19 09:09 echo-Mike

  1. Should this potential module work within some burst of packets (so effectively it can be stateless) or across multiple bursts within one session (this will require it to be stateful)?

I think it'll be much more useful if it can work on multiple concurrent TCP connection. That way it can work on both live TCP stream as well as a pcap file

  1. Are there any requirements for "spitting out" reorderred packets (a.e. the output stream of reorderer if we accept that the input sequence of packets is it's input stream)? How You see it?

I'm not sure what exactly you mean, can you please clarify?

  1. Should it be the generic module (for any protocol based on packet unique numberring, including retransmissions) or TCP specific (more information can be acquired from SYN/FIN/RST flags, ACK number and SACK option, empty ACK packets may be ignored as well as keep-alive packets)?

I believe we can limit the implementation to TCP which is probably the most popular use case

  1. Should it work for both sides of the connection or only for one?

I think it should work on both sides

seladb avatar Sep 23 '19 06:09 seladb

hi @echo-Mike please let me know if you have any more questions. Do you think you can implement this module?

seladb avatar Oct 08 '19 07:10 seladb

Hi, @seladb. I currently have no time do an open source projects due a volume of work. I have some more questions before I can do something within this issue but I will only ask them when i will have a little more free time.

echo-Mike avatar Oct 08 '19 07:10 echo-Mike

sure @echo-Mike thanks for letting me know. Please feel free to reach out to me if you have more questions

seladb avatar Oct 08 '19 07:10 seladb

This issue was opened a long time ago and there wasn't any traction. I'll close it for now

seladb avatar Apr 01 '23 07:04 seladb