swift-gif icon indicating copy to clipboard operation
swift-gif copied to clipboard

Multicore encoding

Open philipturner opened this issue 2 years ago • 3 comments

This repository could be much faster by encoding each frame in parallel. This is necessary to achieve acceptable latency in a production renderer, where videos are currently encoded in the GIF (640x640, 20 FPS). I am forking swift-gif and will update this thread when it has been optimized.

philipturner avatar Aug 22 '23 21:08 philipturner

Using the GIF in the tests as a benchmark, I first decreased a bottleneck in the single-core execution context. Replace the indices of [Int] with [Int32], reducing execution time by 20%.

Before After
17.1 s 13.7 s

Next, I parallelized across all CPU cores, with 8 performance and finally 2 efficiency cores. Then, re-ran the benchmark with a work redistribution algorithm to balance work across cores. Parallel efficiency is a metric usually used to measure strong scaling on supercomputers. Efficiency cores count as half a core for this metric.

Cores Original Load Balanced Speedup Parallel Efficiency
2 7.91 s 7.12 s $1.92\times$ 96%
3 5.35 s 4.95 s $2.77\times$ 92%
4 4.21 s 3.74 s $3.66\times$ 92%
5 3.35 s 3.07 s $4.46\times$ 89%
6 3.00 s 2.62 s $5.24\times$ 87%
7 2.72 s 2.34 s $5.85\times$ 84%
8 2.40 s 2.06 s $6.65\times$ 83%
9 2.34 s 1.99 s $6.88\times$ 81%
10 2.15 s 1.90 s $7.21\times$ 80%

Overall, this was a +801% speedup.

philipturner avatar Aug 23 '23 01:08 philipturner

Pretty cool, feel free to open a PR. I'm actually surprised the Int -> Int32 made a big difference, I would not have expected that, at least not on modern 64-bit CPUs.

fwcd avatar Aug 23 '23 01:08 fwcd

It's because the compiler can vectorize the instructions, and perform more comparisons in one instruction (theory 1). Another theory is slightly reduced memory bandwidth.

philipturner avatar Aug 23 '23 02:08 philipturner