vid.stab icon indicating copy to clipboard operation
vid.stab copied to clipboard

Sluggishness in Apple Silicon

Open meneguzzi opened this issue 1 year ago • 8 comments

I have tried to use ffmpeg to stabilise a video I took with my cellphone. I had done this before in an Intel Mac, and the speed of the passes were slow but not sluggish. I recently tried doing the same in my Apple Silicon Mac (a quite beefy spec: Apple M1 Max, 32GB RAM), and the frame rate was extremely slow (best speed so far was speed=0.0339x. I checked that I am using the versions of all software (ffmpeg and libvidstab) from homebrew compiled for Apple Silicon, so this is definitely not an emulation problem.

ffmpeg -i clip.mp4 -threads 8 -vf vidstabdetect -f null - ; ffmpeg -i clip.mp4 -threads 8 -vf vidstabtransform clip-stabilized.mp4;

This might be my own recollection of what to expect being wrong, but is this speed normal?

meneguzzi avatar May 24 '23 13:05 meneguzzi

Mh, I guess the M1 does not have the SSE extensions, so the optimized machine code is not used. If this is the reason the slow speed is expected.

georgmartius avatar May 24 '23 13:05 georgmartius

I'm trying to understand the issue here, so do you have bits of straight asm code in your codebase? Or would it be possible to tweak the Makefile to use specific compiler options to mitigate this problem in Apple Silicon (or other non-Mac Arm machines)?

meneguzzi avatar May 24 '23 13:05 meneguzzi

As a follow up (so I can try to investigate this further). Is there anything I can read about how you use those instructions for the speedup?

meneguzzi avatar May 25 '23 12:05 meneguzzi

As an update, I found that there are some solutions to this problem of the SSE API:

The first one seems to be the easiest to use. If I got some time, I aim to try compiling this locally and do a pull request.

meneguzzi avatar May 29 '23 08:05 meneguzzi

As an update, I found that there are some solutions to this problem of the SSE API:

  • https://github.com/DLTcollab/sse2neon
  • https://github.com/simd-everywhere/simde

The first one seems to be the easiest to use. If I got some time, I aim to try compiling this locally and do a pull request.

@meneguzzi Did you managed to make this work?

juanctecdam avatar Jun 07 '23 10:06 juanctecdam

Hi, I tried for an hour or so to make ss2neon work, and it does compile generating a dynamic library. I did not have time to test it with ffmpeg yet, will try to do it ASAP.

meneguzzi avatar Jun 07 '23 11:06 meneguzzi

Done, I pushed changes I tried to do as a workaround. I did not have time to test compiling ffmpeg with that, so if somebody (or @juanctecdam) has time to do it, I'd appreciate it.

meneguzzi avatar Jun 07 '23 11:06 meneguzzi