Implement Table Binary Search in OpenCL
Currently, the binary search during table lookups is done in the CPU. Investigate if doing this in the GPU is faster, and if so, update the lookup code to do that instead.
Well, I experimented with binary searching on GPU. Turns out this is massively slower on the NVIDIA RTX 2070. Searching through one table takes around 15 minutes, versus just 10 seconds with the CPU. Since this isn't anywhere in the same ballpark, it seems safe to assume other GPUs aren't going to do any better.
But if anyone wants to experiment with this on their own, see the gpu_binary_search branch. It's not 100% finished, but it does let you experiment with optimization. If anyone achieves notable breakthroughs, polishing it up shouldn't be hard.