rperf icon indicating copy to clipboard operation
rperf copied to clipboard

Code formating and refine

Open ssrlive opened this issue 1 year ago • 33 comments

update do latest mio and support windows.

ssrlive avatar Dec 17 '23 11:12 ssrlive

Hey, thank you very much for taking an interest in this project!

I would like to welcome the changes you have submitted, but I am afraid the sudden, and rather large, pull-request caught me off-guard. I have only just now finished reading through the changes you have made, so I was unable to respond sooner.

Usually, refactoring and alterations on this scale would come after some sort of discussion with the project's maintainer to determine what needs to be done and why. Because I do not yet know your intentions, I am afraid that I cannot accept this pull-request yet, but I would like to figure out how we can avoid creating an unnecessary fork of rperf.

Aside from a few small details, which I will comment on in the commits themselves, I do not see anything problematic, so the bulk of my concern rests with your statement about publishing your revisions to https://crates.io/ after one month has passed.

I do not see the value of rperf existing as a Crate. In my opinion, it is an application, not a library. Furthermore, neither I nor my team have the bandwidth to enable its use as a library: its internal interfaces are not documented for use by other developers, its implementation may change unexpectedly, and the amount of support requests that follow from having a supply-chain-hosted library with varying versions would likely kill the project.

If you need to have rperf exist as a public Crate, please explain why so we can figure out how it can be maintained going forward. cargo can pull from private repositories and other sources, so your build pipeline should be unaffected if one of those suits your needs, and it is a GPLv3 project, so you are free to fork it and distribute your own version if you require some changes that do not belong upstream.

Another detail of importance is attribution. If your changes are to be merged, for GPLv3 compliance (copyright, proof of ownership), it is important that you provide some sort of sign-off that indicates that you grant use of your changes to the project under the terms of the license agreement. This is usually as simple as putting your name and an e-mail address in the header of any files you have materially changed (a lot of your commits are whitespace and linting changes, so those do not need to be updated); if there is some reason for which you cannot do this, an online handle and a link to your GitHub profile should be fine.

Related to your recent changes for Windows, I ended up porting the TCP stream logic to socket2 when I tackled the problems associated with mio and Poll-reassociation, though it is possible that your tcp_stream_try_clone() implementation may serve as an alternate solution. See https://github.com/opensource-3d-p/rperf/issues/3 and https://github.com/opensource-3d-p/rperf/compare/master...socket2 for details -- if you are experiencing any reliability issues, you may want to check your logic against that branch. I was planning to publish a new version based on it before the end of the year, now that I finally have access to a Mac so I can verify that macOS (Apple Silicon) works properly, but it appears that your code was derived from master, which is an older branch.

flan avatar Dec 19 '23 18:12 flan

My original intention was to find a replacement for iperf3, which is full of bugs and weird behaviors that are difficult to fix. So I tried rperf. But I found that there is no bind parameter in rperf, so I added it. But here, after I saved the file, I found that after automatic formatting, the code had changed in many places, so I formatted the entire project according to Rust's default code style. Then I realized I had gone too far. But there was no point in stopping. So upgrade clap, upgrade others, and finally mio. This is what it looks like now. I retract my statement about publishing to crateio, and I'd like your cooperation to make rperf what it should be. I will modify the code based on your suggestions until it meets everyone's expectations.

ssrlive avatar Dec 20 '23 00:12 ssrlive

I would like you to publish it to cratesio yourself so other users can simply install it via ‘cargo install rperf’.

ssrlive avatar Dec 20 '23 00:12 ssrlive

The reason I did so much work before was that I found that no new code had been submitted to this project for a year. I thought this project had been abandoned, and I felt it was a pity.

ssrlive avatar Dec 20 '23 00:12 ssrlive

Ah, I'm sorry that the project gave the impression that it was abandoned.

It's an open project I maintain out of interest and on behalf of my employer, because iperf2 is very unstable and iperf3 is not well-suited to continuous operation, in addition to its flaws, regressions relative to iperf3, and very long patch-review process.

rperf is an important part of the products we develop, so it has some internal QA requirements, preventing experimental ideas from being published early and slowing down the official release cycle. That, combined with my needing hardware (and time) to ensure macOS can be supported, is why the branches from August and September have not yet been promoted. I do try to respond quickly to issues as they come up in all of my projects, though.

It hasn't received frequent updates because it hasn't really needed them. It has been very stable and close to feature-complete in all contexts in which my company has needed it, which is a relatively finite scope, given that its purpose is to test network reliability and Layer-3 networking is a well-understood problem that does not change very quickly.

As for distribution, my plan was to provide precompiled binaries for Microsoft and Apple platforms (and Linux above some reasonable version of libc, likely whatever shipped with Debian Bullseye) after the changes in the September branch had been validated through field-testing, bumping the version to 1.0 at the same time.

While I do see the appeal of making it easy to install on any system that has a Rust build-chain, I suspect the most common use-case is that of a network tech just wanting to run a test, probably carrying it on a USB drive, and, at least in my experience, the moment they have to think about the state of what's installed on their computer, they look for another option.

I view crates.io as being another thing to maintain, but it's possible that I just don't know enough about its update pipeline. I do worry that it would increase the amount of support that would be required without providing significant benefit to most users.

flan avatar Dec 20 '23 01:12 flan

I have tested on macOS, it works fine. But It's not fine on windows, it's always report timeout. it should be fixed.

ssrlive avatar Dec 20 '23 02:12 ssrlive

That sounds like the problem that led me to replace Mio with socket2. WinSock refuses to allow its handle to be accessed again from another thread and mio doesn't allow its Poll-association to change.

I don't know whether it would be easier to start from the socket2 branch and port your other changes on top of that or to try porting in the socket2 changes.

A number of my colleagues have been using a Windows build from the socket2 branch without issue, so I'd like of prefer to avoid having to request regression-testing.

flan avatar Dec 20 '23 02:12 flan

I will try to port your socket2 branch to my branch.

ssrlive avatar Dec 20 '23 03:12 ssrlive

done

ssrlive avatar Dec 20 '23 08:12 ssrlive

I'll read through this today and hopefully be able to get some testing done over the weekend

Thank you

flan avatar Dec 20 '23 14:12 flan

I find that if rperf run in remote VPS as a server, command rperf -c <remote_addr> -R will fails. do you noticed this?

ssrlive avatar Dec 20 '23 14:12 ssrlive

It might help to be more specific about what fails, but I'll try with those I have access to.

This sounds like a probable firewalling issue, though.

By default, rperf will use any ports the OS offers (it binds to 0, which lets the host OS choose, and then client and server share this information through the TCP control socket).

Many ISPs and OSes, especially those connected directly to the Internet, will block inbound traffic on all ports by default, so even though rperf opened a port to receive data, an external mechanism is likely to prevent traffic from actually reaching that port.

To address this, be sure that you know of some ports that are open (you can try using telnet, socat, or just throw something together in Python), and provide the argument --tcp-port-pool. This will change the behaviour from letting the host decide to having rperf manage its own set of ports. Values between 60,000 and 64,000 are usually best. Many OSes consider the range between 10,000 and 60,000 to be free for dynamic allocation, with the higher ranges considered reserved for private use. Below this range lies IANA allocation, and above is sometimes used by vendor software.

flan avatar Dec 20 '23 15:12 flan

In iperf3, the -R parameter represents reverse mode. In this mode, the data transmission direction is opposite to the normal mode. In other words, data is sent from the server to the client.

However, this does not mean that the server actively connects to the client. During the process of establishing a connection, the client still actively connects to the server. Once the connection is established, the server starts sending data to the client.

Therefore, even when the -R parameter is used, the connection establishment is still actively initiated by the client.

I looked at your code and I saw that your server is actively connecting to the client, which is not applicable to internal networks behind NAT or firewalls.

https://github.com/opensource-3d-p/rperf/blob/14d382683715594b7dce5ca0b3af67181098698f/src/stream/tcp.rs#L474-L477

Do we need to rewrite the code?

ssrlive avatar Dec 21 '23 09:12 ssrlive

For rperf's intended use-case, NAT was never a concern.

3D-P's typical deployment scenario is a closed network, almost always air-gapped from the Internet. You could kind of think of it as being a data-centre, but where every server is on a mobile vehicle with frequent connection loss.

Because of the nature of these networks, and the abundance of strangely constructed proprietary applications found within, we have found that having rperf operate like many control systems, where the server does indeed connect directly to clients, exposes firewall configuration issues and other behavioral anomalies (like poorly configured multi-path routing) much more quickly than waiting for periodic reports and troubleshooting requests.

Personally, I feel that there is no reason that iperf3 and rperf cannot or should not coexist, just that iperf3 does not cover the always-on, multi-client scenario very well (if at all), and that to extend it to handle this case would needlessly complicate its code and testing requirements. iperf3 is very useful as a ubiquitous, supervised testing tool, while I see rperf as being built for passive monitoring, to be integrated into a larger network-health-management solution, with supervised testing being a secondary case. I view rperf's role as that of a rival, pushing iperf3 to fix its problems and excel in its strengths, though it's nowhere near popular enough to have that influence yet. For my part, I intend to continue to submit reports and patches to iperf3 as I encounter critical issues that do not require it to be extensively rearchitected, and iperf3 continues to be part of the products that 3D-P ships.

My preference, when I write the 1.0 release documentation, would be to recommend iperf3 for the Internet use-case, where NAT is likely to be common, because running an unsupervised, Internet-exposed server that can quickly lead to significant data-transmission costs is just a bad idea, even before considering security risks, while rperf is right for closed networks where the value is watching for outages and changes over time.

If you really want to add NAT support, I'd strongly suggest doing it in another branch because it will dramatically increase the complexity of what already needs to be reviewed and tested, and it will break protocol compatibility, but it cannot replace existing behaviour (it must be an optional feature), it must work for UDP (and have a plan for SCTP), which means understanding how every major firewalling system determines whether a packet is "related" in conntrack nomenclature (likely achievable by sending a burst of trace packets to show a point of origin before responses are sent to the same place, but I can only speak for recent Linux), and it must be reasonably secure against Internet attacks, which means the client and server must identify each other before sending real traffic, for which a public-key exchange may be required (a plaintext PSK is susceptible to hijacking), and the control channel should probably be encrypted -- just having the control channel run SSL from the start and using a PSK after that may be sufficient, but it should also be possible to assign a different PSK or certificate on a per-host basis; the data channels can be unencrypted because they carry no sensitive data, and probably should be plaintext to provide more accurate throughput numbers (at least one user runs rperf on an 8-bit controller with no floating-point unit for legitimate industrial purposes, so library suitability in cryptography must be evaluated), though the possibility of malicious packet injection exists for UDP and probably always will, unless its header gains a nonce, which is problematic in its own way due to a lack of guaranteed time-synchronisation, even if a baseline delta is established in initial negotiation and all clients support a non-tick-based system clock -- I've seen some very extreme clock-drift on allegedly robust systems, even within the span of a single real-world minute.

flan avatar Dec 21 '23 16:12 flan

got it.

ssrlive avatar Dec 21 '23 16:12 ssrlive

Adjusted following suggestion.

ssrlive avatar Dec 22 '23 04:12 ssrlive

I'm not entirely sure where this broke, but https://github.com/opensource-3d-p/rperf/blob/socket2/test.py#L128 is failing.

======================================================================
FAIL: test_udp_reverse (__main__.TestIpv4.test_udp_reverse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/flan/projects/ssrlive/rperf/./test.py", line 143, in test_udp_reverse
    self.assertEqual(result['summary']['bytes_received'], result['summary']['bytes_sent'])
AssertionError: 0 != 1000224

It seems to be related to the way omit is being handled. The time-value can be increased to -t 10, but if -O 1 is specified, it's like the bytes-received event isn't propagated at all.

https://github.com/ssrlive/rperf/blob/master/src/protocol/results.rs#L648 is where the problem is manifesting, since the results only has a single interval for UDP receive:

{
    ...
    "streams": [
    {
      "abandoned": false,
      "failed": false,
      "intervals": {
        "receive": [
          {
            "bytes_received": 5001120,
            "duration": 10.000581741333008,
            "jitter_seconds": 4.642256044462556e-6,
            "packets_duplicated": 0,
            "packets_lost": 0,
            "packets_out_of_order": 0,
            "packets_received": 4140,
            "timestamp": 1703266093.4379778,
            "unbroken_sequence": 4140
          }
        ],
        //there should be ten of these, around one second each
        //instead, we only see one, of approximately ten seconds
    ...

This is probably happening because something changed in the UDP receive loop so it isn't emitting a result after a second (https://github.com/ssrlive/rperf/blob/master/src/stream/mod.rs#L26) has elapsed. Might have been an over-simplification in refactoring.

For the others, having delta values being a bit off in the tests isn't ideal, but I wouldn't consider them an outright failure, rather something to be assessed manually. A value of 0, however, indicates that something isn't working at all.

flan avatar Dec 22 '23 17:12 flan

changed.

ssrlive avatar Dec 22 '23 18:12 ssrlive

Verified that the tests are mostly working now, except for the UDP receive thing. The remaining issues seem to be related to the environment in which I'm conducting tests, since there are a bunch of tunnels.

It affects both forward and reverse streams, so it's almost certainly just that the loop needs to emit updates every time the interval elapses, like the TCP-receive loop.

flan avatar Dec 22 '23 18:12 flan

So I don't know how to deal with it.

ssrlive avatar Dec 22 '23 18:12 ssrlive

I'll try to get to it this weekend. It shouldn't be complex, but I'll need to set aside a bit of time to experiment.

I've still got a lot of work I need to finish this week and next, and I wasn't expecting to work on rperf at all before January, so I'm kind of scrambling to find time right now.

flan avatar Dec 22 '23 18:12 flan

It doesn't seem to show up in this thread, but I pushed a fix for the UDP regression to your master branch: https://github.com/ssrlive/rperf/commit/fb68b59543f6dbaa17746797b9648b7acf4e03a3

There was an inner loop that seems to have been introduced as part of removing mio.

All other tests seem to be operating within reason right now, so hopefully I'll be able to get non-Linux platform testing done this week, then I'll be glad to merge this, though I won't be ready to call it 1.0 until a few other people in my team at work have had an opportunity to try to break it.

flan avatar Dec 25 '23 18:12 flan

Here is the result in my windows. it looks OK.

PS C:\rperf> cargo r -- -c 192.168.3.12 -u
==========
UDP send result over 10.11s | streams: 1
stream-average bytes per second: 124473.545 | megabits/second: 0.996
total bytes: 1259040 | per second: 124473.545 | megabits/second: 0.996
packets: 1220 per second: 120.614
==========
UDP receive result over 10.01s | streams: 1
stream-average bytes per second: 68575.377 | megabits/second: 0.549
total bytes: 686280 | per second: 68575.377 | megabits/second: 0.549
packets: 665 | lost: 555 (45.5%) | out-of-order: 0 | duplicate: 0 | per second: 66.449
jitter: 0.018021s over 86 consecutive packets

PS C:\rperf> cargo r -- -c 192.168.3.12 -u -R
==========
UDP send result over 19.38s | streams: 1
stream-average bytes per second: 87810.693 | megabits/second: 0.702
total bytes: 1701768 | per second: 87810.693 | megabits/second: 0.702
packets: 1649 per second: 85.088
==========
UDP receive result over 19.67s | streams: 1
stream-average bytes per second: 86504.380 | megabits/second: 0.692
total bytes: 1701768 | per second: 86504.380 | megabits/second: 0.692
packets: 1649 | lost: 0 (0.0%) | out-of-order: 0 | duplicate: 0 | per second: 83.822
jitter: 0.005764s over 1649 consecutive packets

ssrlive avatar Dec 26 '23 04:12 ssrlive

And it's OK in macOS.

ssrlive avatar Dec 26 '23 05:12 ssrlive

I've also been able to do some testing between Linux (amd64 and arm), macOS (arm), and Windows (x64 and arm) and what's in this branch seems to work, though I still need to let others in my team try to break it before finalising anything.

Still missing is some way for me to attribute your work, though. How would you like to be added to the relevant copyright lists?

Also, I see that you are working on a NAT-traversal branch. I'm interested in seeing how that turns out, but I am scheduled to be on vacation for the next two weeks, so aside from answering questions, I will not be able to do any testing or review any code that might come out of it.

flan avatar Dec 29 '23 21:12 flan

I've completed support for NAT, just on the nat branch, and it seems to be performing well so far.

After the current merge work is completed, I will submit another PR and I will ask for your opinion whether to choose the refactor branch or the nat branch.

The refactor branch does not add new functionality, it just refactors the code to make it more readable.

ssrlive avatar Dec 30 '23 12:12 ssrlive

Hi @flan , today is 2024. Now you can review this PR?

ssrlive avatar Jan 13 '24 07:01 ssrlive

I'm back from vacation now and I've gone through the changes; I think the master branch is in good shape.

I'm going to try to arrange for some testing with our QA team on Thursday. If we can't find any broken edge-cases, I'll proceed with merging, though I do still need to know how to credit your contributions for GPL reasons.

At that point, we can either call it 1.0 and I'll provide binaries or we can defer the jump to 1.0 until after the NAT changeset is complete.

flan avatar Jan 15 '24 16:01 flan

This step can apply after merged.

I do still need to know how to credit your contributions for GPL reasons.

It can be added my name next line of your name.

ssrlive avatar Jan 19 '24 07:01 ssrlive

Testing on Thursday went well, so I'm ready to merge and drop my older branches.

I really do need to be able to credit you in some way before the code lands in the main repository, though, even if it's just attribution like ssrlive <https://github.com/ssrlive> instead of a more traditional name and e-mail address.

flan avatar Jan 20 '24 17:01 flan