utcp icon indicating copy to clipboard operation
utcp copied to clipboard

TCP checksum computation: improve performance

Open hannesm opened this issue 2 years ago • 6 comments

from #30

siege -c 1 -r 1000 on my laptop (and a retreat unikernel on the other side) results in: mirage-tcpip (which doesn't validate any checksums): ~2000 req/s current main (https://github.com/robur-coop/utcp/commit/cea1509fdd711f32522d78ac4ba3ea4f1473f003): ~1400 req/s this head (https://github.com/robur-coop/utcp/commit/d639adc1fa4a354ac5ce6d50a7dddb89059684a8): ~1500 req/s utcp without checksum verficiation: ~2300 req/s

It may be worth to investigate using C or assembly (see e.g. https://blogs.igalia.com/dpino/2018/06/14/fast-checksum-computation/ and especially https://github.com/snabbco/snabb/pull/899).

hannesm avatar Nov 28 '23 13:11 hannesm

with 1b341a8 (avoiding bounds checks) we get ~1800 req/s

hannesm avatar Dec 03 '23 12:12 hannesm

The linked article in the PR (http://locklessinc.com/articles/tcp_checksum/) is very interesting and, after a quick test, I can confirm that it computes the csum very fast :) (but so far the result has a wrong indianess to me (e.g CS=0x7EC7 instead of 0xC77E) but after 150000 iteration of random length in [8B,63kB]:

  • C 32b word csum = 22.08 us
  • C 64b word csum = 13.04 us
  • checksum15 = 3.12 us

palainp avatar Dec 04 '23 15:12 palainp

Interesting numbers @palainp -- so do you have a comparison to the current implementation in this library (Checksum.digest)?

If you happen to have a comparison table and the C code also integrated into this library, I'd appreciate a PR (if the C code is much faster).

hannesm avatar Dec 05 '23 12:12 hannesm

Unfortunately this only was with a local test. I'll try to add a C binding here and PR when it's done and if it's faster. (I suppose it might be hard to bind against/maintain the asm version?)

palainp avatar Dec 05 '23 14:12 palainp

well, asm -- why not? ;) considering that lots of deployed systems are amd64, we can have special assembly for that. E.g. mirage-crypto has feature detection when to use which code paths. What is crucial from that experience is that while the assembly is fine to be shipped always, it is important to not restrict the resulting binary when build on one system (with specific CPU features) to run on a system that requires the very same features (i.e. https://github.com/mirage/mirage-crypto/pull/53 was a great achievement).

Since, esp. with unikernels and in general, I prefer to have a separate build machine from running machine.

Embedding of asm code is also best done using C mnemonics (see mirage-crypto repository as example).

hannesm avatar Dec 05 '23 15:12 hannesm