otclient Optimize XTEA encryption

Pushing the same code present in TFS. Tired of seeing this being sold in plain sight :stuck_out_tongue_closed_eyes:

"This patch enables auto-vectorization for XTEA encryption and decryption, exploring vector (SSE, AVX) instructions to be auto-enabled by the compiler. Should render up to 16x faster encryption/decryption throughput on AVX2, 8x on AVX and 4x on SSE/Neon."

"Out with all that optimal block size width, let the compiler figure it out. Reduces the amount of code by half and increases performance by ~20%."

Reasoning and detailed study

Jan 30 '21 09:01 ranisalt

Let's hold this, there is a possible optimization in otland/forgottenserver#3406

Apr 09 '21 13:04 ranisalt

@ranisalt since #3406 was merged, do you want to close this or there's still something useful here?

Jun 26 '21 15:06 DSpeichert

This was already merged upstream, so I suggest we rebase and then just add the precomputed key changes

Jun 27 '21 02:06 ranisalt