libpcap icon indicating copy to clipboard operation
libpcap copied to clipboard

Add support for PACKET_FANOUT on Linux

Open tsnoam opened this issue 4 years ago • 29 comments

This work is based on PR #674 while taking into consideration the review comments which caused the PR to be closed by the original author, @xpahos.

tsnoam avatar Oct 23 '19 16:10 tsnoam

Hi all, are there any issues with merging this PR?

mluscon avatar Feb 11 '20 12:02 mluscon

I wonder if there's a more generic way to expose this, as many network stacks provide similar capabilities. Ones I definitely know of are DPDK and XDP (technically Linux, too, but you configure it by other means), but I expect most other mainstream *nixes to have equivalent features.

Oppen avatar Feb 13 '20 18:02 Oppen

@Oppen I get what you're saying that a consolidated way is for the best. However, since DPDK, XDP & PF_PACKET are so different in semantics, requiring the direct consumer to understand:

  1. They're using one technology over another.
  2. How the technology works (how to spawn threads, how to associate them with the kernel, who allocates the memory?).
  3. As of "2" above, how to properly configure the fanout? I would say that such consumer would want the full power to configure fanout without being masked out of control.

From what I gather, libpcap currently has initial support for DPDK, while it provides no support for XDP. I suggest to currently implement fanout support for PF_PACKET as suggested in the PR (maybe even mark it as "unstable API" in the manpage). Once there will be enough code & knowledge about how to use other tech, it will be possible to either have separate APIs or find a way to consolidate them all.

tsnoam avatar Feb 19 '20 08:02 tsnoam

@tsnoam I see your point, but I think masking implementation details is more or less the point of using libpcap as an abstraction. I'm OK-ish with marking it as an unstable API, although I have my doubts on that concept (eventually it has users and you can never drop it without facing their wrath). Note though that I'm not a maintainer. My opinions and suggestions are just that, and carry no weight towards acceptance. Just in case you feel discouraged by my reluctance to make it platform-specific at the API level.

Also, it may be interesting to take a look at this patchset.

Oppen avatar Apr 14 '20 01:04 Oppen

@tsnoam I see your point, but I think masking implementation details is more or less the point of using libpcap as an abstraction. I'm OK-ish with marking it as an unstable API, although I have my doubts on that concept (eventually it has users and you can never drop it without facing their wrath). Note though that I'm not a maintainer. My opinions and suggestions are just that, and carry no weight towards acceptance. Just in case you feel discouraged by my reluctance to make it platform-specific at the API level.

Also, it may be interesting to take a look at this patchset.

  1. Again, I get what you're saying about properly wrapping all various technologies into a single API, hiding implementation details from the user. However, waiting until everything was properly learned and consolidated may take too long and practically stops progress.
  2. I looked at the patchset you've sent. It seems to be already merged into the kernel. If you were referring to the fact that there are more functions to wrap: well, this is exactly the point: you make small progress by slowly adding functions to competing APIs (packet socket vs dpdk vs ... ). One day you reach a point where you know enough of all technologies and feel comfortable wrapping them into a single API. But until then, why wait?

tsnoam avatar Apr 16 '20 06:04 tsnoam

Also, thanks for the review. I pushed and updated version of the code.

tsnoam avatar Apr 16 '20 06:04 tsnoam

  • I agree.

  • That wasn't a counter argument or a suggestion for you to implement it now. It was just a "look at this cool thing", that may interest you because you're interested in something similar :) I know it's merged, but the patch has a nice explanation about what it does, and I think it's a good way to point to features.

Oh, thanks for the reference then :-)

In any case, I feel that we do have an understanding that this PR is worthy of finding itself into upstream, with the open question of whether it should be marked as "unstable" pending maintainers decision.

tsnoam avatar Apr 16 '20 14:04 tsnoam

In any case, I feel that we do have an understanding that this PR is worthy of finding itself into upstream, with the open question of whether it should be marked as "unstable" pending maintainers decision.

I think it's definitely worthy of getting upstream. That doesn't mean improving it (whatever that means) is out of the question.

Oppen avatar Apr 16 '20 14:04 Oppen

@tsnoam @mcr @guyharris I tried building libpcap from your branch. And I followed the doc. Everything compiles well and I test it using 3 parallel processes.

So I changed my pcap_open_live to the following in each process:

pcap_handle = pcap_create((char*)pcap_file, err_open);
set = pcap_set_fanout_linux(pcap_handle, 1, 0, root_idx);
pcap_set_snaplen(pcap_handle, snaplen);
pcap_set_promisc(pcap_handle, promisc);
pcap_set_timeout(pcap_handle, 500);
pcap_activate(pcap_handle);

Which prints the desired output:

process (group) id: 0 fanout set: 0
process (group) id: 1 fanout set: 0
process (group) id: 2 fanout set: 0

However, when checking if FANOUT worked, I observe that each process still receives all packets. Note that I choose HASH to be flow-based (0 for mode value) and I know that testing traffic is well balanced based on flow hash (30%, 30%, 30%). Do you have any clue how to fix it?

aouinizied avatar Sep 12 '20 16:09 aouinizied

@aouinizied it's very strange. I'm using the same construction in production:

#if defined(_linux_)
    int val = (getpid() & 0xffff) | (PACKET_FANOUT_LB << 16);
    if (setsockopt(pcap_fileno(pcap.Session()), SOL_PACKET, PACKET_FANOUT, &val, sizeof(val)) < 0) {
        Cerr << "Failed to set fanout: "
             << strerror(errno)
             << "\n";
        return;
    }
#endif

pcap_set_fanout_linux here do the same as

int val = (getpid() & 0xffff) | (PACKET_FANOUT_LB << 16);

Does your root_idx is unique?

xpahos avatar Sep 12 '20 17:09 xpahos

@aouinizied looks like I found the problem:

pcap_set_fanout_linux(pcap_t *p, int enable, uint16_t mode, uint16_t group_id)

I think it will be better to use:

set = pcap_set_fanout_linux(pcap_handle, 1, PACKET_FANOUT_LB, root_idx);

because the third parameter is mode. I think it will be better to remove enable because it's not used in code.

xpahos avatar Sep 12 '20 17:09 xpahos

@xpahos Thanks for the feedback. I will replace PACKET_FANOUT_LB by 0 for FLOW_HASH mode, test, and get back to you.

aouinizied avatar Sep 12 '20 18:09 aouinizied

@xpahos checked that each root_idx is unique but the same observations.

set = pcap_set_fanout_linux(pcap_handle, 1, 0, root_idx)

The only difference with your setup is the mode. You set it to PACKET_FANOUT_LB whereas I use PACKET_FANOUT_HASH as the third parameter.

aouinizied avatar Sep 12 '20 18:09 aouinizied

@aouinizied yes, you right. PACKET_FANOUT_HASH should send packets only to one unique socket per each address port mapping.

xpahos avatar Sep 12 '20 19:09 xpahos

@xpahos another detail that may help reproducing is that I'm using:

int rv_handle = pcap_next_ex(pcap_handle, &hdr, &data);

to poll the received packets. But this should not affect fanout right?

aouinizied avatar Sep 13 '20 00:09 aouinizied

@xpahos I found what was wrong. And it's clearly stated in the doc.

  • group_id, as the name indicates, is a group of processes/threads.
  • FANOUT will duplicate traffic across groups. (what I was observing)
  • For a specific group_id. Each process/thread started with this group_id will receive 1/N of the traffic (where N is the number of processes/threads started with this group_id).
  • The mode defines load balancing strategy (on kernel side).

In my example replacing root_idx by 0 solve it and now I see traffic correctly load-balanced across my processes. @tsnoam Thank you for this great contribution!

A detail that maybe need to be added to the doc is that for PACKET_FANOUT_HASH there is a known bug on some Linux kernel versions. The bug consists in that a src->dst flow traffic will be hashed with a different hash than dst->src and this will result in inconsistency for bidirectional flow-based apps.

aouinizied avatar Sep 13 '20 01:09 aouinizied

@aouinizied Sorry for the delay in response, I'm happy to see that you've eventually fixed the problem. Anyhow, most of the thanks are to @xpahos who had made the original PR, I have based on their work.

@xpahos that's a good catch regarding the enable flag. It appears that I have forgot to actually use it. I have just pushed a commit to do so.

tsnoam avatar Sep 13 '20 05:09 tsnoam

@mcr I've rebased my branch on top of the latest master and made appropriate fixes as of commit 97f59a7d035397c7b4f4889ef5657be811b97e7a

tsnoam avatar Nov 10 '20 09:11 tsnoam

Hi. I wanted to try out fanout support in tcpdump. Is this mode controlled by a command line option to tcpdump or is it always enabled in this branch?

javedshakeel avatar Feb 05 '21 12:02 javedshakeel

@tsnoam is it possible to rebase your branch?

Many thanks for your contribution.

aouinizied avatar Sep 01 '21 12:09 aouinizied

@aouinizied I tried rebasing (and then build), and it passes cleanly.

have you encountered problems trying to rebase yourself?

tsnoam avatar Sep 01 '21 12:09 tsnoam

@tsnoam no but I'm using your fork (until it gets merged within libpcap master) within my project (https://www.nfstream.org/docs/#building-nfstream-from-sources) and seems to be outdated. updating your branch with your rebase would be really helpful.

aouinizied avatar Sep 01 '21 14:09 aouinizied

Hi @aouinizied ,

As the branch can be directly merged to master, i mostly wait for the maintainers to pick it up. Since rebasing will also require me to test that no internal changes make it incompatible - things which can only be seen when running, I prefer to leave my PR and branch as it is, so the maintainers of of libpcap will be able to see on which commit I am based.

Personally, my systems work in production based on my branch, but I do not have the capacity at the moment to test a rebase over latest master.

tsnoam avatar Sep 01 '21 14:09 tsnoam

@mcr Do you have an estimation when this PR will be merged to master? Is there something you expect me to change which I've missed?

Anything I can do to help, let me know!

tsnoam avatar Sep 01 '21 14:09 tsnoam

Thank you for proposing these changes. If you consider them ready for merging, please rebase them on the current master branch and squash the commits together into one or more commits, so each change completely progress the working tree from one consistent working state to another.

infrastation avatar Sep 01 '21 18:09 infrastation

@mcr I've added the documentation as requested. I hope it is clear enough. If not, I'll be happy to fix as necessary.

@infrastation I do consider this change as ready for merging (FWIW, I've been using this code for 2+ years in production). As requested, I've squashed all commits and rebased it over current master.

Regarding CI failure in AppVeyor, it seems unrelated to my change (winflexbison is missing from the system).

tsnoam avatar Sep 02 '21 14:09 tsnoam

@guyharris @tsnoam any plan to merge this PR? I'm using this branch since more than 1 year on a production network and it works well. It will be good to have it as part of next official release.

Thanks, Zied

aouinizied avatar Nov 25 '21 04:11 aouinizied

@guyharris any news regarding this PR?

aouinizied avatar Mar 24 '22 13:03 aouinizied

REMINDER: @infrastation @mcr any estimation of when this PR will be merged?

Zied

aouinizied avatar Aug 24 '22 14:08 aouinizied