SDL icon indicating copy to clipboard operation
SDL copied to clipboard

Cross-platform native video playback support

Open fdelapena opened this issue 2 years ago • 8 comments

This was proposed several times up to 15 years ago after a quick forum search, but operating systems evolved to provide a bit more solid video support nowadays with modern codecs, not needing to provide fat codec libraries anymore.

It seems there are no cross platform libraries covering all these platforms so far in a lightweight native form, so there's a good chance for innovation on this. The "Qt Multimedia" package has a similar native approach but not a lightweight dependency due to Qt itself.

Lightweight cross-platform support for video playback on selected operating systems using native APIs:

  • Freedesktop based systems: GStreamer. Available on most distros because no patent fear due to the modular codec packaging. PackageKit integration allows asking for download missing codecs if needed on the fly.
  • Windows: Media Foundation (Vista or later). Supports H.264 video codec and extra codecs from the Microsoft Store (AV1, etc.).
  • macOS/iOS: AVFoundation. It seems to support modern codecs and even some patent-free codecs on recent versions.
  • Android: MediaPlayer. Good patent-free codec coverage support.

fdelapena avatar Dec 08 '22 02:12 fdelapena

Sure, if you want to put together a proposed API and prototype implementation, that would be great.

slouken avatar Dec 08 '22 02:12 slouken

I'd assume this implies native audio decoding support too, so here's my first idea:

// Initialize audio video decoding subsystem
SDL_Init(SDL_INIT_AV);

// Open new AV instance.
SDL_AV *av = SDL_AVOpen(SDL_RWops *rwops);
// Closes AV instance
SDL_AVClose(SDL_AV *av); 

// Get amount of audio streams.
int SDL_AVNumAudios(SDL_AV *av);
// Get amount of video streams.
int SDL_AVNumVideos(SDL_AV *av);

// Open new audio stream. `index` starts from 0 inclusive to `SDL_AVNumAudios` exclusive.
SDL_AVAudio *avAudio = SDL_AVOpenAudio(SDL_AV *av, int index);
// Get audio sample rate.
int SDL_AVAudioGetSampleRate(SDL_AVAudio *avAudio);
// Get channel count.
int SDL_AVAudioNumChannels(SDL_AVAudio *avAudio);
// Get number of channels or (Uint64)-1 if unknown.
Uint64 SDL_AVAudioNumSamples(SDL_AVAudio *avAudio);
// Get audio sample format.
SDL_AudioFormat SDL_AVAudioGetFormat(SDL_AVAudio *avAudio);
// Get current sample position or (Uint64)-1 if unknown.
Uint64 SDL_AVAudioTell(SDL_AVAudio *avAudio);
// Seek to specific sample position. Offset always start from the beginning.
SDL_bool SDL_AVAudioSeek(SDL_AVAudio *avAudio, Uint64 offset);
// Decode samples. `buffer` datatype depends on `SDL_AudioFormat`. Data is converted to native endian order. `nsamples` is the size of the buffer, in **sample frames**, not bytes.
size_t SDL_AVAudioDecode(SDL_AVAudio *avAudio, void *buffer, size_t nsamples);
// Close the audio stream instance.
void SDL_AVAudioClose(SDL_AVAudio *avAudio);

// Open new video stream. `index` starts from 0 inclusive to `SDL_AVNumVideos` exclusive.
SDL_AVVideo *avVideo = SDL_AVOpenVideo(SDL_AV *av, int index);
// Get video dimensions.
void SDL_AVVideoGetDimensions(SDL_AVVideo *avVideo, int *width, int *height);
// Get video pixel format.
SDL_PixelFormatEnum SDL_AVVideoGetPixelFormat(SDL_AVVideo *avVideo);
// Get video duration in seconds.
double SDL_AVVideoGetDuration(SDL_AVVideo *avVideo);
// Get current video position or -1 if unknown.
double SDL_AVVideoTell(SDL_AVVideo *avVideo);
// Seek to specific position. Offset always start from the beginning.
SDL_bool SDL_AVVideoSeek(SDL_AVVideo *avVideo, double offset);
// Increment video duration. Shorthand of `SDL_AVAudioSeek(avVideo, SDL_AVVideoTell(avVideo) + increment)`.
SDL_bool SDL_AVVideoUpdate(SDL_AVVideo *avVideo, double increment);
// Checks whetever there's new frame needs to be decoded.
SDL_bool SDL_AVVideoFrameUpdated(SDL_AVVideo *avVideo);
// Decode video frame. `newframe` serves various purposes and must be set by user before decoding:
// * If `newframe` is false:
//   * If no frames needs to be decoded (update interval is small enough) then `newframes` is set to `true` and this function returns NULL. SDL_GetError() will return no error.
//   * If there are frames need to be decoded, `newframe` is unchanged and this function returns new `SDL_Surface*` or NULL on decoding error.
// * If `newframe` is true, `SDL_Surface*` is always returned regardless of `SDL_AVAudioFrameUpdated` or NULL on failure. `newframe` is unchanged.
SDL_Surface *SDL_AVVideoGetFrame(SDL_AVVideo *avVideo, SDL_bool *newframe);
// Close the video stream instance.
void SDL_AVVideoClose(SDL_AVVideo *avVideo);

Some of the API notes:

  • Standard -1 and NULL return SDL error handling applies, unless noted otherwise.
  • Opening SDL_AV* must not depend on the filename.
  • Video dimensions is assumed to be constant through the whole file.
  • Seeking video must be as accurate as possible. Forward seek should skip video frames while backward seek must be frame accurate, not seeking to nearest keyframe.
  • I'm not sure if SDL should convert non-RGB pixel format or left it to user, hence the SDL_AVVideoGetPixelFormat function.
  • I strongly suggest ot have FFmpeg backend too, unless GStreamer also has FFmpeg backend.

I'm interested with this support. Suggestions and criticism of my API proposal are welcome.

MikuAuahDark avatar Dec 08 '22 06:12 MikuAuahDark

This is looking like an SDL satellite API to me, like SDL_image and SDL_mixer.

slouken avatar Dec 08 '22 16:12 slouken

Probably, yeah. Now you mention it, I think this kind of library should goes to separate SDL library.

MikuAuahDark avatar Dec 08 '22 16:12 MikuAuahDark

I strongly suggest ot have FFmpeg backend too, unless GStreamer also has FFmpeg backend.

Yes, GStreamer has a FFmpeg plugin, used e.g. by Firefox on Linux.

fdelapena avatar Dec 08 '22 20:12 fdelapena

I can't imagine that OS-level video playback mechanisms are well suited for use in video games. Part of the reason for Bink video's popularity with game developers (besides being patent-free) is that it can be decoded in a shader, which makes it easy to integrate into a game (especially if you need the video actually in the game, not just as a full-screen cutscene or something).

If you want something lightweight and easy to integrate but don't want to pay for Bink, there's always pl_mpeg

sridenour avatar Dec 11 '22 20:12 sridenour

I can't imagine that OS-level video playback mechanisms are well suited for use in video games.

I don't know other APIs enough to comment, but Apple's video decode API has integration with Metal and OpenGL to get native Metal/GL textures of each video frame's contents. e.g. https://developer.apple.com/documentation/corevideo/1456754-cvmetaltexturecachecreatetexture?language=objc

Whether that's faster than Bink, I'm not sure - it probably depends on the specifics of the hardware and such (hardware versus software decoding, video format complexity, GPU performance, etc etc.)

slime73 avatar Dec 13 '22 23:12 slime73

Quick reading tells me all the proposed backend the OP given supports decoding to GPU textures directly. However Media Foundation could be an issue as it only provides decoding to D3D11 texture.

MikuAuahDark avatar Dec 14 '22 04:12 MikuAuahDark

WGL_NV_DX_interop / WGL_NV_DX_interop2 could help regarding D3D11 conversion, but not sure how good is the current driver support on Intel and AMD on Windows.

fdelapena avatar Dec 21 '22 01:12 fdelapena