ndt-server icon indicating copy to clipboard operation
ndt-server copied to clipboard

ndt5: emulate Web100 variables required by OONI's UI

Open bassosimone opened this issue 5 years ago • 11 comments

I noticed that ndt5 only sets MinRTT and zeroes all other variables. There is a bunch of variables that we use in OONI. This issue tries to document what we'll be missing in the new platform and suggest ways to compute these values. The discussion is from the point of view of the variables that need to be computed (and possibly added) to the ndt5 code. All the variables that are listed in this issue cause users to see incorrect values. See #163 for the less critical variables.

Implemented

  1. MinRTT: already computed but I suggest to check the tcpi_min_rtt field of tcp_info (not mapped in the Go code ndt5 is using but definitely in the kernels). I suggest to use that tcp_info field rather than the minimum value found when sampling the RTT.

Figured out 🤞

  1. SumRTT and CountRTT: MK/OONI uses them to compute the average RTT. I suggest to fill these two fields as part of the loop currently used to compute MinRTT.

  2. MaxRTT: I suggest to add the MaxRTT field (it's not one of the MUST-have fields) and to compute it using the loop currently used to compute the MinRTT.

  3. PktsOut: this can be computed from the tcpi_segs_out field [make sure this does not also count the pure ACKs]

  4. CurMSS: use tcpi_snd_mss field of tcp_info

Pessimistic, no clue

  1. CongestionSignals: I time boxed myself and could not find a reliable way of estimating the value of this field from the data in tcp_info. Also, I am pessimistic about this. [help needed]

  2. AckPktsIn: like CongestionSignals (but less pessimistic) [help needed]

  3. DupAcksIn: like *CongestionSignals (less pessimistic) [help needed]

  4. Timeouts: like *CongestionSignals (pessimistic) [help needed]


Discovered by me and discussed with @stephen-soltesz. Added as assignees also @mattmathis and @gfr10598 hoping that they may correct me, if wrong, and help where I am stuck.

bassosimone avatar Aug 02 '19 18:08 bassosimone

Have you also looked at the tcpinfo variables to see whether they could be used? Is ndt5 getting a tcpinfo snapshot already?

Ah - I see that you are looking at the tcpinfo.

We should also work toward reducing redundancy between experiments and tcpinfo, but probably won't be able to do that until late Q3.

gfr10598 avatar Aug 05 '19 13:08 gfr10598

related to https://github.com/m-lab/ndt-server/issues/159

stephen-soltesz avatar Aug 26 '19 16:08 stephen-soltesz

@critzo Please put the metrics CIRA cares about here too

pboothe avatar Sep 09 '19 16:09 pboothe

https://github.com/m-lab/ndt-server/issues/163#issuecomment-529553532

Per the email thread with CIRA, they needs the following metrics (and whatever fields make up the metrics):

  • Jitter (if possible - @mattmathis needs to clarify)
  • Packet loss
  • Upload speed
  • Download speed

stephen-soltesz avatar Oct 03 '19 12:10 stephen-soltesz

Additionally, here is an updated message from CIRA. In summary, the metrics @stephen-soltesz posted in the previous comment should be the only ones CIRA needs, and if available they may also show TCP_INFO variables as a replacement for web100 ones in the current client.


CIRA SAYS:

If we don’t map (or recreate) web100 values for the NDT test from TCP_INFO, then let’s not show them.

For CIRA (I can’t say about other NDT users), we are not wedded to the NDT web100 variables. We show them. If we instead showed TCP_INFO variables, that works equally well for us.

For CIRA, our main concern is Speed, Latency and Jitter. Our secondary concern is to have the most accurate test for both very low and relatively high speeds. Ie satellite. Our tertiary concern are other TCP issues (ie bufferbloat, etc). If the new NDT 7 would help on any of those. We are also interested.

Our current problem is we have nobody working on IPT at the moment. So other than me trying to keep an eye on things. We have not made a commitment to make a lot of changes to the IPT. But I want to make sure that the new versions of NDT will still meet our needs, so that when I can get a few resources I can make the changes and things are hunky dory._

critzo avatar Oct 03 '19 13:10 critzo

The web100 variables the new server sends are required to be sent by the server as part of the NDT protocol. That said, many of them are no longer applicable in a BBR world. This is a protocol bug. CIRA's client is encouraged to not show them, but the server is not permitted to not send them.

Speed they have (and have always had), latency they have (in the form of MinRTT). "Jitter" is a term they have to define for us - like I have said many times before: "jitter" is an ambiguous term.

The new version of NDT will meet their needs much better than the old one (in terms of improved measurement accuracy, better metrics, reduced error rates, increased ease of integration, and more). They will have to do development work to use it. It sounds like they don't actually need all the old Web100 metrics filled in, which is good.

pboothe avatar Oct 04 '19 14:10 pboothe

Packet loss is desired by lots of people. I still need to figure out how and whether it is sent.

pboothe avatar Oct 04 '19 14:10 pboothe

In summary: it sounds like CIRA is flexible enough that I am not longer worried about NDT5 and NDT7 meeting their needs as long as it meets the needs of others.

pboothe avatar Oct 04 '19 14:10 pboothe

Packet loss is desired by lots of people.

For ndt7, I was leaning towards telling people about %-age of bytes retransmitted from TCPInfo. It is not the packet loss rate, but it is something probably close enough. What do you think?

bassosimone avatar Oct 04 '19 18:10 bassosimone

Because NDT pretty much only ever sends full packets, the percentage of bytes retransmitted should closely match the percentage of packets retransmitted. This sounds good to me.

pboothe avatar Oct 21 '19 14:10 pboothe

As mentioned in another issue, OONI will switch to ndt7 soon, therefore it is becoming less relevant to us to focus on ndt5. I am again not sure about closing, maybe it's still relevant in other contexts?

bassosimone avatar Feb 13 '20 18:02 bassosimone