moonlight-ios icon indicating copy to clipboard operation
moonlight-ios copied to clipboard

[WIP] Use vt for manually decoding frames. Fixes #533

Open felipejfc opened this issue 3 years ago • 25 comments

Two main changes

1 - Use VideoToolbox to manually decode each frame instead of submitting it directly to AVSampleBufferDisplayLayer; I'm not proud of this change, but it was needed to fix https://github.com/moonlight-stream/moonlight-ios/issues/533. There may be some way to fix the issue without needing this change, but I still didn't manage to do it.

2 - Latency and smoothness changes 2.1 - Use Direct Submit in VideoDecodeRenderer (reduces latency) 2.2 - Use PTS information correctly per frame instead of using the DisplayImmediately flag in each sampleBuffer. Together with the change above, I was able to replicate smooth low latency stream as I get into the Nvidia Shield. I think using the flag messed with frame time and caused jittering.

Right now, I'm breaking the "Smooth Stream" option that we added some months ago, but wanted to create the PR either way for us to discuss options @cgutman

felipejfc avatar Nov 26 '22 19:11 felipejfc

Plot Twist: As part of tackling the improvements you suggested @cgutman, I hit the reason for the original issue #533. Still unsure about the root cause, but it's these lines:

https://github.com/moonlight-stream/moonlight-ios/blob/master/Limelight/Stream/VideoDecoderRenderer.m#L346-L360

I ran some tests, and we don't even need to set any value in the dict; only getting it will cause the decoder to go nuts:

   CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
    CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);

This is enough to make the decoder fail when the sample buffers contain HDR data. I imagine it must be due to some OS bug. If I remove the whole block, the decoder will work just fine. Interestingly enough, these lines will also break the manual VTDecompression flow (error kVTVideoDecoderMalfunctionErr | -12911); this is how I figured I should try commenting on them in the original solution.

Given that:

  • Do you think the solution to decouple decoding still makes sense? Or should we move forward with removing these lines only? Honestly, I'm not sure if they make any difference. From my tests, I see none when I change them.
  • I can send another PR with only this change and using the PTS information; Since we can't get the reference to the dict this will be needed as we can't set DisplayImmediately flag. WDYT?

felipejfc avatar Nov 28 '22 12:11 felipejfc

Do you think the solution to decouple decoding still makes sense? Or should we move forward with removing these lines only? Honestly, I'm not sure if they make any difference. From my tests, I see none when I change them.

Removing those lines should be fine.

I can send another PR with only this change and using the PTS information; Since we can't get the reference to the dict this will be needed as we can't set DisplayImmediately flag. WDYT?

Yep, let's do that to fix #533 ASAP and we can see if using a VTDecompressionSession improves things further vs the current pure AVSampleBufferDisplayLayer solution.

Do you see a frame pacing regression using the PTS info with the pacing option enabled? If so, we can just use this solution for HDR streaming only for now.

cgutman avatar Nov 30 '22 00:11 cgutman

@cgutman as part of the changes, I wanted to test different ways of decoding and rendering, using VT to manually decode and update a CALayer with the resulting image, continue to pass the encoded buffer directly to AVSampleBufferDisplayLayer, and then measure the latency of each approach. Any ideas on how I could benchmark these solutions reliably?

felipejfc avatar Dec 02 '22 02:12 felipejfc

I suppose you could use a phone in slow motion mode.

For now though, let's try to get HDR on the new Apple TV, then we can fine tune things later.

Can you send your basic PR with just the HDR fix?

cgutman avatar Dec 02 '22 03:12 cgutman

@cgutman I isolated the changes to fix HDR here https://github.com/moonlight-stream/moonlight-ios/pull/536. Will keep on with researching different methods of drawing to improve latency -- in the newer ATV 4K it's still more noticeable than M1 Macs and Nvidia Shield

felipejfc avatar Dec 02 '22 12:12 felipejfc

Going to build and test the lowest latency option on my Apple TV 4K 2021 with MoCA setup and report back!

Starlank avatar Dec 02 '22 13:12 Starlank

@felipejfc do the latency improvements only apply to the 2022 Apple TV 4K?

Starlank avatar Dec 02 '22 19:12 Starlank

@Starlank the changes here and in the other PR should not improve latency; but should improve stream "smoothness" using the low latency pacing mode. I'm currently studying latency improvements locally.

felipejfc avatar Dec 04 '22 20:12 felipejfc

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

Alanko5 avatar Dec 16 '22 11:12 Alanko5

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency. The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically) Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

felipejfc avatar Dec 16 '22 13:12 felipejfc

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

Is this SDR->HDR mapping?

felipejfc avatar Dec 16 '22 13:12 felipejfc

If you like SDR change to HDR:

let pixelTransferProperties = [kVTPixelTransferPropertyKey_DestinationColorPrimaries: kCVImageBufferColorPrimaries_ITU_R_2020,
                                           kVTPixelTransferPropertyKey_DestinationTransferFunction: kCVImageBufferTransferFunction_SMPTE_ST_2084_PQ,
                                           kVTPixelTransferPropertyKey_DestinationYCbCrMatrix: kCVImageBufferYCbCrMatrix_ITU_R_2020]

VTSessionSetProperty(decompressionSession,
                                 key: kVTDecompressionPropertyKey_PixelTransferProperties,
                                 value: pixelTransferProperties as CFDictionary)

Do not forget that on tvOS it is necessary to switch the TV to HDR mode.

Is this SDR->HDR mapping?

Yes, apple mentions it somewhere in the documentation. It does not generate an HDR image, but it improves SDR.

Alanko5 avatar Dec 16 '22 14:12 Alanko5

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency. The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically) Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

I don't think it's caused by HW. tvOS is a different system than iOS. I think that it is enough to find some setting that will only be enabled. But I may be wrong.

Alanko5 avatar Dec 16 '22 14:12 Alanko5

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency. The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically) Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

What latency are we talking about? Now I tried to measure the decoding time. H265 decoding took 0.001sec. Where do you see this delay? Maybe I don't fully understand the problem.

Alanko5 avatar Dec 16 '22 14:12 Alanko5

Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency. The way that reduced it the most was using kCVPixelBufferIOSurfacePropertiesKey property so that the image received in the decompression callback has a backing IOSurface and setting the displayLayer contents directly (ditching the SampleBufferDisplayLayer basically) Btw given how much latency the AppleTV, even newest model has, when compared to ipads or iphones I think that it's some hardware related latency between the ATV and the display (monitor/TV)

What latency are we talking about? Now I tried to measure the decoding time. H265 decoding took 0.001sec. Where do you see this delay? Maybe I don't fully understand the problem.

There's streaming delay witn ATV4K when compared to streaming with an iphone/ipad or nvidia shield. I compared them using a stopwatch application and slow-mo iPhone camera to compare PC screen time with streaming screen time

felipejfc avatar Dec 16 '22 15:12 felipejfc

I understand. Measure how long it takes you to decode. According to my measurements, it is 0.001sec. If you think the delay is causing the decoding, switch to H264. There, the delay is even smaller.

In my opinion, the timing will help you solve the problem.

Do you not use WiFi when measuring? :-)

Alanko5 avatar Dec 16 '22 15:12 Alanko5

I understand. Measure how long it takes you to decode. According to my measurements, it is 0.001sec. If you think the delay is causing the decoding, switch to H264. There, the delay is even smaller.

In my opinion, the timing will help you solve the problem.

Do you not use WiFi when measuring? :-)

For 4k HEVC I think I was getting 8ms time to decode each frame. 10~11 ms total time to receive the whole frame, pack it together and decode

felipejfc avatar Dec 16 '22 17:12 felipejfc

Did you measure the decompression time of the Key and non-Key frames? Can you set the server to send fewer keyframes? (for example one per two seconds)

What is the total delay of the image that you measured with the camera?

What version of apple tv do you have?

How do you create a VTDecompressionSession? I mean, what parameters are you setting?

Alanko5 avatar Dec 16 '22 17:12 Alanko5

Code is in this branch https://github.com/felipejfc/moonlight-ios/tree/ds_queue_surface

I have the latest 4K Apple TV(2022) with the iPhone 12 pro processor.

What is the total delay of the image that you measured with the camera? The streamed image would be always 25~50 hundredths behind the original image; when testing with m1 MacBook, or nvidia shield, most of the time the images would be in sync.

Did you measure the decompression time of the Key and non-Key frames? I measured all frames and they all took this same amnt of time.

Can you set the server to send fewer keyframes?

Pretty sure gamestream won't allow me to do it

felipejfc avatar Dec 16 '22 17:12 felipejfc

According to what you write, the problem is not in decoding. I think that by improving the decoding you can gain a maximum of 5ms. The first thing I would look for is a network or rendering delay. Because a delay of 250~500ms is huge!

Well, you can try as follows:

It is necessary that you set this value (As I wrote above): kCVPixelBufferMetalCompatibilityKey

Your destinationImageBufferAttributes:

NSDictionary *pixelAttributes = @{
        (id)kCVPixelBufferMetalCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfaceCoreAnimationCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfacePropertiesKey : @{},
    };

I think that during rendering it would help if the layer could use Metal.

NSDictionary *videoDecoderSpec = @{
         (id) kCMFormatDescriptionExtension_FullRangeVideo : FORMAT_DESC_FullRangeVideo,
         (id) kCVImageBufferChromaLocationBottomFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferChromaLocationTopFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferPixelAspectRatioKey: FORMAT_DESC_AspectRatio,
         (id) kCVImageBufferColorPrimariesKey: FORMAT_DESC_ColorPrimaries,
         (id) kCVImageBufferTransferFunctionKey: FORMAT_DESC_TransferFunction,
         (id) kCVImageBufferYCbCrMatrixKey: FORMAT_DESC_YCbCrMatrix
};

Alanko5 avatar Dec 16 '22 18:12 Alanko5

According to what you write, the problem is not in decoding. I think that by improving the decoding you can gain a maximum of 5ms. The first thing I would look for is a network or rendering delay. Because a delay of 250~500ms is huge!

Well, you can try as follows:

It is necessary that you set this value (As I wrote above): kCVPixelBufferMetalCompatibilityKey

Your destinationImageBufferAttributes:

NSDictionary *pixelAttributes = @{
        (id)kCVPixelBufferMetalCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfaceCoreAnimationCompatibilityKey : (id)kCFBooleanTrue,
        (id)kCVPixelBufferIOSurfacePropertiesKey : @{},
    };

I think that during rendering it would help if the layer could use Metal.

NSDictionary *videoDecoderSpec = @{
         (id) kCMFormatDescriptionExtension_FullRangeVideo : FORMAT_DESC_FullRangeVideo,
         (id) kCVImageBufferChromaLocationBottomFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferChromaLocationTopFieldKey: kCVImageBufferChromaLocation_Left,
         (id) kCVImageBufferPixelAspectRatioKey: FORMAT_DESC_AspectRatio,
         (id) kCVImageBufferColorPrimariesKey: FORMAT_DESC_ColorPrimaries,
         (id) kCVImageBufferTransferFunctionKey: FORMAT_DESC_TransferFunction,
         (id) kCVImageBufferYCbCrMatrixKey: FORMAT_DESC_YCbCrMatrix
};

Sorry, I misspelled it. It's actually 25~50ms delay! I will try your changes anyways when I get home; travelling right now so that's only next week

felipejfc avatar Dec 16 '22 18:12 felipejfc

Why is this still not resolved? What is it waiting for?

jasin755 avatar Jan 02 '25 15:01 jasin755

A few of us have been hacking on some iOS frame pacing stuff and I've incorporated this PR into some test code I'm working on. My branch is at https://github.com/andygrundman/moonlight-ios/tree/andyg.ios-frame-pacing but fair warning it's very much an experimental WIP. Just wanted you to know this good patch has not been forgotten.

andygrundman avatar Jun 09 '25 01:06 andygrundman

I'm available to help with testing to close out this issue as well! Just got an Apple TV 4K, and the amount of lag I'm experiencing in Moonlight is very noticeable.

  • I experience no perceptible lag when using an Nvidia Shield, but the lag on Apple TV 4K is noticeable
  • I've tried using both an Xbox one controller connected to both the Apple TV 4K and the host computer - noticeable lag in both scenarios
  • My Apple TV 4K is connected via ethernet, so bandwidth / network latency is not the problem

pierreski avatar Jun 17 '25 16:06 pierreski

Hi, same here. I also just got a new Apple TV 4k 3rd gen experiencing render delays. The input device is directly connected to the PC. I would also be happy to offer my help on testing just like @pierreski

adamsondavid avatar Jul 12 '25 18:07 adamsondavid