probe-cli
probe-cli copied to clipboard
feat(engine): add randomtraffic experiment
Checklist
- [x] I have read the contribution guidelines
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2447
- [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/271
- [x] if you changed code inside an experiment, make sure you bump its version number
Description
This test aims to detect the censorship of fully random traffic. In short, the experiment sends random bytes to an IP address chosen at random from a list of pre-determined public IP addresses that were affected by this censorship in the past and records information about the nature of censorship. This censorship was originally detected from the Great Firewall of China (GFW).
Censorship Description
Our team reverse engineered the GFW's new censorship system and determined that it uses the following rules to exempt traffic from blocking:
For the first TCP payload sent by the client, allow the traffic to continue if any of the following hold:
- It matches the protocol fingerprint for TLS or HTTP.
- The first six bytes of the payload are all [0x20, 0x7e].
- More than 50% of the payload are [0x20, 0x7e].
- More than 20 contiguous bytes of the payload are [0x20, 0x7e].
- popcount(payload)/len(payload) is less than 3.4 or greater than 4.6.
In addition to these rules, the censorship only occurs when connecting to a certain list of IP addresses.
If the IP address is in the censored range and none of the above hold, there is an approximate 26.3% chance the connection is censored. For a more detailed description of the censorship please see the reading copy of our paper.
Test Goals and Procedure
The main goal of the test is to inform the user whether or not they are experiencing censorship on connections that send fully encrypted packets that appear random, as well as to record information about censored packets in order to better understand the censorship algorithm. The test seeks to accomplish these goals by doing the following:
- If no IP address is given by the user, select an IP address from the list of IP addresses in the affected range.
- Complete a TCP handshake with the IP address and send a stream of null bytes as a control test. If this control test succeeds then proceed with the experiment, otherwise attempt the control test with a new IP address two more times or until the control test is successful. If no control test succeeds end the test and return the error.
- Complete a TCP handshake with the IP address and send a stream of random bytes. If this connection times out, we attempt to connect once more to check for residual censorship. If the residual censorship test results in a timeout, we end the test, record information about the blocked packet, and inform the user they are experiencing censorship. Otherwise we continue with the test.
- Step 3 is repeated 19 more times to account for the blocking rate.
- If no errors occurred and the test was completed, all connections are then closed and the test informs the user they are not experiencing censorship.
False Negative and False Positive Rates
Using an IP known to be in the censored range, the false negative rate (the rate at which the test will say there is no censorship present when in fact there is) of this test was calculated to be approximately 1.05%. On the other hand, after running the test 10,000 times from a location not experiencing censorship, no false positives were recorded.
IP List Construction
The IP list was created by first obtaining a large list of public TCP servers. The test was then performed five times on each IP from a computer where censorship is expected. The final list of IP addresses is made up of only the IP addresses which reported censorship all five times. In order for one of these IP addresses to not be in the censored range, each of the five reports of censorship would have had to have been false positives, which we know to be extremely unlikely, meaning we can label these IP addresses as in the censored range.