stash icon indicating copy to clipboard operation
stash copied to clipboard

[Feature] Implement hardware accelerated decoding

Open NodudeWasTaken opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe. Generating and transcoding are CPU heavy operations, offloading the decoding of videos to the GPU can help performance/speed greatly.

Describe the solution you'd like A checkbox enabling hardware accelerated decoding for:

  • Generation (of previews and scrubber sprites).
  • Live transcoding.
  • Task transcoding.

Hardware accelerated decoding can be easily enabled in ffmpeg by prepending -hwaccel auto before the input video file.

Describe alternatives you've considered Another solution could be user customizeable ffmpeg commands.

Additional context Its important to note that hardware accelerated decoding can reduce the quality of the videos. It should be noted that im unaware of what happens if the selected decoder cant decode the selected video because of resolution problems or some other incompatibility.

NodudeWasTaken avatar Aug 21 '22 23:08 NodudeWasTaken

This should help to reduce the load on CPU, because even integrated GPU now supports encode/decode HEVC, VP9 etc.

I see there is a PR for this https://github.com/stashapp/stash/pull/1041 sitting for more than a year now. Will -hwaccel auto make it much easier (without specifying the method)?

deepradio avatar Nov 17 '22 06:11 deepradio

This should help to reduce the load on CPU, because even integrated GPU now supports encode/decode HEVC, VP9 etc.

I see there is a PR for this #1041 sitting for more than a year now. Will -hwaccel auto make it much easier (without specifying the method)?

That PR is for transcoding/encoding video, which is unrelated. All hwaccel will do is speed up video decoding (and by that notion, preview generation, and possibly cpu transcoding if low on resources). Although you could instead specify the decoder for full-hardware transcoding, that should probably be reserved for its own PR.

NodudeWasTaken avatar Nov 17 '22 23:11 NodudeWasTaken

That PR is for transcoding/encoding video, which is unrelated. All hwaccel will do is speed up video decoding (and by that notion, preview generation, and possibly cpu transcoding is low on resources). Although you could instead specify the decoder for full-hardware transcoding, that should probably be reserved for its own PR.

Thanks for clarification! Not very familiar with this. So for live transcoding, will -hwaccel auto help? I would like that to be faster. Looks like Stash is using pure CPU right now for that.

deepradio avatar Nov 18 '22 03:11 deepradio

That PR is for transcoding/encoding video, which is unrelated. All hwaccel will do is speed up video decoding (and by that notion, preview generation, and possibly cpu transcoding is low on resources). Although you could instead specify the decoder for full-hardware transcoding, that should probably be reserved for its own PR.

Thanks for clarification! Not very familiar with this. So for live transcoding, will -hwaccel auto help? I would like that to be faster. Looks like Stash is using pure CPU right now for that.

If you currently hit 100% cpu usage while transcoding, it will help.

NodudeWasTaken avatar Nov 22 '22 16:11 NodudeWasTaken

$150 bounty assigned (ref: 595560)

WithoutPants avatar Nov 26 '22 20:11 WithoutPants

My preferred implementation for the first iteration is to allow customisation of the ffmpeg command line for generation and live streaming. We can put more hard-coded options in a subsequent iteration.

WithoutPants avatar Nov 26 '22 20:11 WithoutPants

My preferred implementation for the first iteration is to allow customisation of the ffmpeg command line for generation and live streaming. We can put more hard-coded options in a subsequent iteration.

One thing, even if that goes through, docker users wouldn't be able to use it without gpu passthrough

NodudeWasTaken avatar Dec 08 '22 14:12 NodudeWasTaken

One note from me because I did modify the stash code to use gpu decoding and encoding (no PR because because I just hardcoded specific values I wanted). I end up with removing hardware accelerated sprites generation because it was actually slower. Probably asking GPU to decode one frame was slower than just doing it via software decoder.

Kruk2 avatar Dec 18 '22 02:12 Kruk2

@NodudeWasTaken honestly we're pretty used to it at this point

dsrtusr88 avatar Dec 29 '22 15:12 dsrtusr88