swift-gif
swift-gif copied to clipboard
Multicore encoding
This repository could be much faster by encoding each frame in parallel. This is necessary to achieve acceptable latency in a production renderer, where videos are currently encoded in the GIF (640x640, 20 FPS). I am forking swift-gif and will update this thread when it has been optimized.
Using the GIF in the tests as a benchmark, I first decreased a bottleneck in the single-core execution context. Replace the indices of [Int] with [Int32], reducing execution time by 20%.
| Before | After |
|---|---|
| 17.1 s | 13.7 s |
Next, I parallelized across all CPU cores, with 8 performance and finally 2 efficiency cores. Then, re-ran the benchmark with a work redistribution algorithm to balance work across cores. Parallel efficiency is a metric usually used to measure strong scaling on supercomputers. Efficiency cores count as half a core for this metric.
| Cores | Original | Load Balanced | Speedup | Parallel Efficiency |
|---|---|---|---|---|
| 2 | 7.91 s | 7.12 s | $1.92\times$ | 96% |
| 3 | 5.35 s | 4.95 s | $2.77\times$ | 92% |
| 4 | 4.21 s | 3.74 s | $3.66\times$ | 92% |
| 5 | 3.35 s | 3.07 s | $4.46\times$ | 89% |
| 6 | 3.00 s | 2.62 s | $5.24\times$ | 87% |
| 7 | 2.72 s | 2.34 s | $5.85\times$ | 84% |
| 8 | 2.40 s | 2.06 s | $6.65\times$ | 83% |
| 9 | 2.34 s | 1.99 s | $6.88\times$ | 81% |
| 10 | 2.15 s | 1.90 s | $7.21\times$ | 80% |
Overall, this was a +801% speedup.
Pretty cool, feel free to open a PR. I'm actually surprised the Int -> Int32 made a big difference, I would not have expected that, at least not on modern 64-bit CPUs.
It's because the compiler can vectorize the instructions, and perform more comparisons in one instruction (theory 1). Another theory is slightly reduced memory bandwidth.