Improve the multithread support
Currently we have some of issue that are well known:
- rayon can be better
- compute_lookahead_data should happen in a separate thread

The upcoming rayon solves its problems neatly:

And we have compute_block_importances and compute_lookahead_data that could happily live in a separate thread and not stall the rest of the encoding.
I believe compute_block_importances could be split into multiple threads similar to tiling.
Ah, compute_block_importances should be tiled anyway or the block importances will be inaccurate for blocks that move across tile boundaries.
Edit: This might be incorrect, because it looks like we do use tiles in compute_lookahead_motion_vectors which is one of the data points block importances are based on and the one that would impact this.
https://github.com/xiph/rav1e/commit/8cc2268f8ae16451d79d58f3b1bee32a33e43f0c and https://github.com/xiph/rav1e/commit/1e0a8ce78a8db533bb813641dbd4e2a17c30d48b resulted in ~21% speedup over Chimera 1080p with 8 tiles and >8 threads.
Worth noting that there is what is practically a drop in replacement for Rayon that performs better:
https://github.com/orxfun/orx-parallel