iodine icon indicating copy to clipboard operation
iodine copied to clipboard

iodine performance improvements and features

Open frekky opened this issue 9 years ago • 78 comments

Overview

This fork of iodine was intended primarily to improve performance by using a TCP-like sliding window protocol for having multiple "in flight" fragments both upstream and downstream. This allows greatly increased performance on high-latency connections. In order to do so, the whole data/ping structure has been changed (details available in doc/proto_00000800.txt).

Some limited testing has been conducted, the results of which can be found in the updated man page.

This has been almost fully tested on Linux amd64 and compiles without warnings, however no other platforms have been tested yet. Due to some hacks to get millisecond timer precision on Windows - see windows.h for gettimeofday() and struct timeval macros - various functionality may not work as expected.

Unit tests have been updated to suit changes to the main code base, and a basic sliding window test was created which tests some of the essential functions.

Issues

This fork is still in development, and I plan to keep it up to date with the main iodine repository as much as possible. There are probably lots of currently undiscovered bugs and certainly lots of problems with intolerant DNS servers which cause performance and connectivity issues.

To help diagnose these problems, I strongly recommend that you try -V 5 to print connection statistics such as the number of queries per second, fragments lost, failures, timeouts, round-trip time etc.

  • High query rates (such as >50 / sec) will probably result in DNS servers dropping queries or responding with errors or invalid replies
  • Try changing the encoding to something which uses less strange characters (such as base64/base32) or the DNS query type if you have total connection failures
  • Reduce the upstream and downstream window sizes (using -w and -W options) from the default values to something more suitable to your connection: lower round-trip time means the window size does not have to be so large to get the same throughput.
  • If connection succeeds but data stops flowing and DNS queries are still being answered correctly (check the stats printout for this information), rebuild iodine and iodined with make debug. Turn on more debugging with -DDDDDD (use less Ds if you experience graphic lag in your terminal due to excessive output) and copy the debug output on both iodined and iodine corresponding to the time when the problem started.

Features

Most of the important feature additions are listed here.

  • Guaranteed data arrival (no protection from corruption, however if DNS query fails iodine[d] will re-send fragments as required)
  • Command line options have been to adjust timeouts and sliding window behaviour.
  • Multiple nameservers can be specified to reduce load on a single DNS server.
  • Lazy mode now supports any number (within reasonable limits) of pending queries waiting at the server, adjusted using the downstream window timeout option
  • Client-side statistics report every number of seconds (specified with -V option)
  • More fine-grained client control over data compression, server query timeout and other important connection parameters
  • Client side minimum send interval as an attempt to rate-limit connections if using DNS servers which drop queries under high volume
  • Server timeout is adjusted automatically based on target timeout and the round-trip time
  • More fine-grained automatic adjustment of target timeout and immediate mode switching.

I may have forgotten to mention some features here, but this should cover most of them.

Protocol Overview

Due to the nature of the sliding window protocol, the entire data transfer protocol needed to be rewritten. The new protocol (800) is detailed in the docs, and although the basic DNS encapsulation is the same, the headers have been more-or-less completely rearranged. Upstream and downstream are functionally equivalent at the sliding window layer, where new data packets (ie from tun device on either client or server) are treated as follows:

  1. Data is optionally compressed (depending on user-specific upstream/downstream compression flags)
  2. Raw or compressed data is then split into a number of fragments depending on the user's maximum fragsize (calculated beforehand during the handshake process)
  3. Each fragment is added to the outgoing window buffer (same for both downstream and upstream) and assigned a unique sequence ID from 0 to 255. The window buffer maintains a pointer to the current fragment which is the "start" of the sending window, and while sending fragments, only the windowsize number of fragments are sent in order from the fragment at the start of the window.
  4. The fragments are sent in order from the start of the window as described above.
  5. When the fragments are received at the other end, they are placed in the receiving window buffer at an offset determined by their sequence ID. This way, out-of-order fragments (very common with load-balanced DNS servers) can be easily handled without dropping them.
  6. The receiving end will check if it has received both the starting fragment, the final fragment and all the in-between fragments and if it has, the full data packet is retrieved and the pointer to the start of the next received chunk is moved forwards by the number of fragments.
  7. The received full packet is optionally uncompressed and sent to the tun device.
  8. The receiving end immediately ACKs the fragment using its sequence ID using either a ping or a data packet (both have space for an ACK).
  9. When the ACK is received at the sending side for a fragment, it is marked off in the sending buffer as having been successfully received by the other end and based on this the window can be moved forward and the next few fragments sent.

Other Information

Any other information is available in the code (I've put in a reasonable amount of hopefully helpful comments so it shouldn't be too hard to understand).

Feel free to ask any questions or make comments on any of the changes. I've done quite a lot of refactoring to clarify various parts of the code or make things simpler.

Thanks for all the great work in making something like iodine, and thanks again for making it open source. It's truly been a pleasure working with it and I hope to be able to contribute something to this project.

frekky avatar Nov 11 '15 14:11 frekky

I tried compile your version under MinGW64, no luck, I test on mu linux virtualbox_ubuntu 14 04 3 64-bit _12_11_2015_17_52_43

What I see, your version iodine not smooth, but can achieve higher downstream...

is possible to host multiple topdomain? let say server has multiple WAN IP.

Anime4000 avatar Nov 12 '15 12:11 Anime4000

you should of added ticket bytes and stuff to increase downstream, that way more queries about let's say its split and sent to each ticket then you can query faster right? send it all like that like data.1.domain.com like that data.2.domain.com and basically like that and added the answers as ip addresses like 3 numbers each . all as valid ip addresses but not valid in a query as a way to get around the iodine queries getting blocked i had the queries blocked fast it was unbelievable that i could not connect to my server,just as a experiment because it probably would not get done on the official git

electricarrows0 avatar Nov 12 '15 12:11 electricarrows0

@electricarrows0 There are lots of new options which can change the behaviour of the program. Perhaps the defaults are a little ridiculous (with a windowsize of 8, sometimes up to 8 queries per second while idle with low DNS server timeouts). To reduce the number of timeouts and other errors that cause connectivity issues, try something like iodine -w 1 -W 1 -I 1, and if you're interested in more details try using -V 5 to get a useful statistics report every 5 seconds.

In case you were using an old iodine command, the iodined topdomain goes first now, followed by any number of DNS servers (used in round-robin). If you specify multiple dns servers you may reduce the load on each server, which may help if your DNS servers drop queries under high load. I also recommend using iodined -c in case you have BADIP errors, since using multiple DNS servers will most likely lead to different source addresses being seen from iodined.

@Anime4000 As soon as I have time I'll get a MinGW build environment set up and make sure Windows compatibility is working properly.

If you're having issues with DNS servers behaving strangely or producing lots of errors, try using the above mentioned options (ie. reducing the upstream/downstream window size and the target timeout) and use -V to show more stats which will be helpful if you wish to find the best values for certain options (probably target timeout -I or up/downstream window sizes). If possible, try setting all the "Fine tuning options" and the "options to try if connection doesn't work" manually to low values and increase them while testing to see if you get better results.

Finally, in terms of connection stability, use the -V option to see your connection round-trip time and try setting the downstream fragment timeout -j to something smaller which can help if you have low ping times normally but frequent large spikes (this is probably caused by packet loss - again check the stats using -V to see all that useful information.

At this point in time, connecting to multiple iodined domains would be quite tricky, unless they were all using the same iodined server (in which case it would be relatively simple). I'll consider adding that as a feature in this fork later. Using multiple iodined servers would be harder since the client would have to login and somehow load balance between all of them at the IP level (not likely feasible).

frekky avatar Nov 12 '15 13:11 frekky

@frekky well i'm saying it adjusts as the downstream or upstream increases by creating ticket bytes or to have more users on the same dns ip and all other stuff

electricarrows0 avatar Nov 12 '15 13:11 electricarrows0

@electricarrows0 I'm not completely sure about what you're suggesting but if it was to increase the number of pending queries when more downstream data is available and use only a single query when idle, that would be quite a useful feature to reduce DNS load.

In terms of having multiple users connected to iodined using the same internal IP (such as on the tun device), the issues with load balancing would be quite tricky to handle especially without separating TCP connections etc. The purpose of using a sliding window protocol in this case was to prevent the need of using multiple iodine connections at once (and load balancing between them) and use only a single client with higher throughput.

If you're having issues with certain types of requests being blocked, try changing the DNS type to something else like SRV, CNAME, MX or A (using iodine -T option).

Could you elaborate on what you mean by "ticket bytes"?

Thanks

frekky avatar Nov 12 '15 14:11 frekky

@frekky from here http://heyoka.sourceforge.net/ http://heyoka.sourceforge.net/heyoka-shakacon2009.pdf you might find the ticket bytes thing it basically means like for a slave server but even better not to do that and instead send extra queries is to just use it for speeding up the connection by extra question queries

electricarrows0 avatar Nov 12 '15 14:11 electricarrows0

@frekky you could add spoofing to protect the dns tunnel from getting detected

electricarrows0 avatar Nov 12 '15 14:11 electricarrows0

iOS - iodine 0.6 under cellular network img_0025 at my end, most stable around 24KB/s (3G, H+ signal)

back yo Ubuntu, using this fork, I can get more speed (-w 128 -W 128 & no compression), when wget 4MB test file (half-way), iodine stop responding... :trollface:

I found out using CloudFlare DNS management is bad idea, I setup dns server & iodine in same server.

Under CloudFlare: 2KB/s ~ 5KB/s Under self hosted DNS: 24KB/s ++

Anime4000 avatar Nov 12 '15 14:11 Anime4000

This looks interesting, I will review it when I get the time!

yarrick avatar Nov 15 '15 19:11 yarrick

@frekky file err.h is OpenSSL include\openssl\err.h file? my mingw64 didn't have this file, can you send err.h ?

Anime4000 avatar Nov 15 '15 21:11 Anime4000

Turns out there already was a fix for those, I'd just forgotten to not include err.h when compiling for win32.

frekky avatar Nov 16 '15 13:11 frekky

@frekky try check src/iodined.cat line 31? it should be:

#ifndef WINDOWS32
 #include <err.h>
#endif

Anime4000 avatar Nov 16 '15 16:11 Anime4000

@frekky to be more awesome, natively add route :+1: currently I use script to do that

SERVER_IP → GATEWAY_IP (Direct VPN) DNS_IP → GATEWAY_IP (DNS Tunnel)

so, Iodine wont exit when VPN sessions under it

iOS script, capture current gateway & dns

route -n add -net $SERVER_IP $GW_IP
route -n add -net $DNS_IP $GW_IP

Windows script, capture gateway ip

route add %SERVER_IP% mask %MASK% %GW_IP%
route add %DNS_IP% mask %MASK% %GW_IP%

Anime4000 avatar Nov 19 '15 03:11 Anime4000

Builds on Mac OS X Yosemite 10.10.15 x64.

One compile warning; CC client.c client.c:574:25: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare] if (send_query_sendcnt >= 0 && send_query_sendcnt < 100 && ~~~~~~~~~~~~~~~~~~ ^ ~

cpatulea avatar Nov 19 '15 04:11 cpatulea

With 'make debug', looks like Clang doesn't support -Og:

$ make debug OS is DARWIN, arch is x86_64 CC tun.c error: invalid integral value 'g' in '-Og' error: invalid integral value 'g' in '-Og'

cpatulea avatar Nov 19 '15 04:11 cpatulea

This has also uncovered a host of format string warnings:

window.c:83:3: warning: format specifies type 'long' but the argument has type 'int' [-Wformat] WDEBUG("Resizing window buffer with things still in it! This will cause problems!"); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./window.h:65:3: note: expanded from macro 'WDEBUG' TIMEPRINT("WINDOW-DEBUG ", FILE, LINE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./common.h:90:55: note: expanded from macro 'TIMEPRINT' fprintf(stderr, "%03ld.%03ld ", currenttime.tv_sec, currenttime.tv_usec / 1000);
~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~ window.c:147:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat] WDEBUG("Dropping frag with seqID %u: not in window (%u-%u)", f->seqID, startid, endid); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./window.h:65:3: note: expanded from macro 'WDEBUG' TIMEPRINT("WINDOW-DEBUG ", FILE, LINE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./common.h:90:55: note: expanded from macro 'TIMEPRINT' fprintf(stderr, "%03ld.%03ld ", currenttime.tv_sec, currenttime.tv_usec / 1000);
~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~

cpatulea avatar Nov 19 '15 04:11 cpatulea

ssh <server_ip> cat /dev/urandom: 1.94 MiB/s
iodine <server_ip> (raw mode): 1.87 MiB/s, iodined CPU ~27%
iodine -r <server_ip> (
  DNS mode,
  Switching upstream to codec Base128,
  Switching server options: lazy mode, downstream codec Raw, compression enabled...,
  Setting downstream fragment size to max 1186...
  Determined round-trip time of 183 ms, server timeout of 4817 ms
): 400 KiB/s, iodined CPU ~8%
iodine -r <isp_dns_ip>:
  Opened utun0
Opened IPv4 UDP socket
Sending DNS queries for <domain> to <isp_dns_ip>
Using DNS type TXT queries
Version ok, both using protocol v 0x00000800. You are user #1
Setting IP of utun0 to 10.0.53.3
Adding route 10.0.53.0/24 to 10.0.53.3
add net 10.0.53.0: gateway 10.0.53.3
Setting MTU of utun0 to 1130
Server tunnel IP is 10.0.53.1
Skipping raw mode
Using EDNS0 extension
Switching upstream to codec Base128
Server switched upstream to codec Base128
Autodetecting downstream codec (use -O to override)
Switching server options: lazy mode, downstream codec Raw, compression enabled...
Switched server options successfully. (rlc)
Autoprobing max downstream fragment size... (skip with -m fragsize)iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.768 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.384 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.192 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.96 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.48 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.iodine: Got FORMERR as reply: server does not understand our request
.24 not ok.. iodine: Got FORMERR as reply: server does not understand our request
.iodine: Too many error replies, not logging any more.
..12 not ok.. ...6 not ok.. ...3 not ok.. ...2 not ok.. iodine:
found no accepted fragment size.
iodine: try setting -M to 200 or lower, or try other -T or -O options.

So my ISP DNS somehow rejects new protocol. Tested master, it still works.

cpatulea avatar Nov 19 '15 05:11 cpatulea

In mingw64 (GCC 5.1) it print a lot of error, iodine server cannot compile. Compile Log Here!

Anime4000 avatar Nov 29 '15 19:11 Anime4000

@Anime4000 You will need Git CLI installed and available in the system path. Make sure you can run "git" in a normal command prompt before trying to build. Alternatively, in src/Makefile, change the line

HEAD_COMMIT = `git rev-parse --short HEAD`

to

HEAD_COMMIT = "iodine git"

frekky avatar Nov 30 '15 14:11 frekky

@frekky nice! I just tested on DigitalOcean (Ubuntu Server) <> Home (Windows Client) I can get ~224KB/s, it almost 2mbps by using default primary & secondary DNS provided by ISP

don't use cloudflare DNS, I tried it, cloudflare block multiple request, need self hosted DNS like BIND9

Anime4000 avatar Nov 30 '15 16:11 Anime4000

I have suggestion, can you add --preset for high speed and low latency mode?

Anime4000 avatar Dec 08 '15 02:12 Anime4000

@Anime4000 Good idea! Only problem is that I don't really know what works as "high speed/low latency" considering that depends entirely on your internet connection. Could you perhaps post what works for you under various conditions?

frekky avatar Dec 08 '15 05:12 frekky

What I tested, I use -M 100 -w 128 -W 128 and I get nearly 2mbps, sacrifice alot packet drop. Tried -w 256 -W 256, no data...

maybe --preset, make iodine negotiate which is ideal for -wW or any args for high bandwidth/throughput.

for low latency ideal for Messenger or Voice, I often use on smartphone.

Anime4000 avatar Dec 08 '15 11:12 Anime4000

@frekky is possible to add localhost listener for non-root access/without tun device?

Anime4000 avatar Dec 19 '15 08:12 Anime4000

@Anime4000 I was planning to add something like that when I get time. Probably would be more like SSH ProxyCommand compatibility where data is from stdin/stdout, although doing something similar to SSH -L or -R options would also be useful. At any rate it makes sense to let something like SSH handle compression, encryption and data transfer (ie using SOCKS proxy with -D or local/remote forwards, even as a tun device) rather than trying to implement the same manually, so I'll most likely just implement the stdin/stdout pipe functionality with the server end connecting to a specified local port.

This also would reduce overhead and increase throughput significantly since SSH would then be able to run without the IP or TCP overhead in the tun device and would leave all flow control to iodine itself.

frekky avatar Dec 19 '15 12:12 frekky

@Anime4000 There is now a --preset or -Y option, so that way it's easier to use appropriate values for various situations without having to find out what all the options mean. At this point the presets aren't configurable except by modifying iodine.c.

Would you be able to test with window sizes less than 32?

As it turns out, the server actually doesn't process more than 31 pending requests (unless you've changed QMEM_LEN in server.h to a higher number). I'd be interested to see if you get any performance boosts from that.

frekky avatar Jan 09 '16 14:01 frekky

@frekky I tried latest commit and not working, no download stream received, but... using old commit work just flawlessly iodine-frekky-old-better

What change between old & new commit?

Anime4000 avatar Jan 17 '16 23:01 Anime4000

@Anime4000 Quite a lot changed, now working towards adding iodine ProxyCommand mode and modifying some parts of the protocol. Stick to using an old commit if it worked better since it might be a while before it works (at all) again.

frekky avatar Jan 18 '16 10:01 frekky

Can you make preset for old commit? Also... Possible open more Opened IPv4 UDP socket instead of 3 to increase speed? Another... -m is value after minus 6 or before? 1176-6=1170

UPDATE: After trying new commit, I need find right -M value, currently using 250 8 1000meg test

Anime4000 avatar Jan 18 '16 10:01 Anime4000

@Anime4000 One of the reasons so much has changed is to introduce presets with a global client/server "instance".

Opening more UDP sockets wouldn't be useful at all I'm afraid. It would be better to run multiple instances of iodine simultaneously and load-balance between them somehow, although that was entirely the reason I modified iodine in the first place (to avoid doing that).

-m sets the max downstream fragsize which including the 6 or so bytes used for the header, so the actual size of encoded data is -m value minus size of the packet header (probably more than 6 bytes anyway).

frekky avatar Jan 18 '16 12:01 frekky