bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Input lag with vsync enabled due to limited polling rates

Open alice-i-cecile opened this issue 3 years ago • 15 comments

Problem

@aevyrie has observed noticeable input lag in Bevy applications when vsync is enabled.

The most immediate source of this is quite obvious: we're only fetching the input state at the start of each frame, but are rendering is done at the end of the frame.

Input events appear to be moved into the Bevy app by the winit runner: https://github.com/bevyengine/bevy/blob/de8edd3165c379e05aabb38359b3f4b97f46540a/crates/bevy_winit/src/lib.rs#L221

At 60 fps, this means 16 ms of lag, which is noticeable for some applications: namely for precise cursor movement (FPS, GUI applications) and rhythm games.

Possible solutions

  • Poll more regularly (somewhat limited by the OS I believe)
    • I suspect this could be done at the end of each stage
    • Perhaps a dedicated input-polling thread architecture would help?
  • Poll closer to rendering time

Either solution will involve some trickiness, as we must pierce the Bevy schedule in some fashion in order to insert fresh input events into the World at the right time, rather than merely at the beginning of each pass over the schedule.

alice-i-cecile avatar Dec 13 '21 17:12 alice-i-cecile

IIRC, we are limited by the winit event loop, and we can't poll mouse input asynchronously.

I had some ideas that I could prototype, that effectively act as a frame-limiter, which is related to #1343.

How bevy currently behaves with vsync on:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
4ms     Done.
        --
        Sit around and do nothing
        --
        Send to GPU, present frame (input is 16ms out of date)
16ms ----------------------

How bevy currently behaves with vsync off:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
4ms     Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
8ms     Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
12ms    Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

What I'd like bevy to do with vsync on:

0ms -----------------------
        Sleep until we have just enough time to render a frame, based on how long it took previously.
12ms    Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

We could prototype this by adding a system at the end of the event loop that sleeps for a while after the frame has been sent to the GPU. I'm not intimately familiar with how all that works, but I can try something out.

aevyrie avatar Dec 13 '21 18:12 aevyrie

I believe mailbox vsync should behave similar to disabled vsync in terms of latency.

What I'd like bevy to do with vsync on:

This is called frame pacing, right? You will have to be careful to ensure that variation in render time doesn't cause frames to be submitted too late. Predicting the right time to sleep can be difficult.

bjorn3 avatar Dec 13 '21 19:12 bjorn3

I believe mailbox vsync should behave similar to disabled vsync in terms of latency.

That's what I would expect, but not what I experience in bevy apps.

Predicting the right time to sleep can be difficult.

Definitely. My use case for this would be in applications though, I care more about reducing input latency without just letting the app run at 300fps, draining battery. In fact, for the application use case you could safely add a 2x safety factor to predicted frame render time, and still get a huge improvement in latency without risking dropped frames, because applications like this take very little time to render - on the order of 1-3ms.

aevyrie avatar Dec 13 '21 20:12 aevyrie

Why sleep before the frame, if you can also sleep at the end?

0ms -----------------------
        Start of event loop. Get input. Do stuff.
12ms    Send to GPU, present frame (input is 4ms out of date)
        Sleep until the remaining frame-time is over.
16ms ----------------------

To be honest, this is what I'd expect a frame-limiting strategy to look like. Just wait at the end until the specified time-frame is over, so that there is no timing issue later on in case the renderer is slower than expected :)

minecrawler avatar Dec 14 '21 13:12 minecrawler

That doesn't get around the timing problem, presenting it this way just makes it seem like the problem doesn't exist. That's why I presented it in reverse - it makes the timing problem more apparent.

You need to make sure the time between "send to GPU" is always <16ms to prevent frame drops. In the order you present, you would need to add a sleep between the time the frame is finished and sent to the GPU to act as your factor of safety for any frame time variability. The "gotcha" here is that it seems like you can just sleep until you've hit a total of 16ms, because your total frametime is always 16ms, but it's masking the fact that what you actually care about is the time between "send to GPU" being 16ms.

Anywho, the proof is in the pudding. We should make some prototypes to see if we can make something that works. 😄

aevyrie avatar Dec 14 '21 17:12 aevyrie

I did some work on this.

  1. It appears we are still using FIFO vsync: https://github.com/bevyengine/bevy/blob/ffecb05a0ab423b46ab991b49c54516bcaf2ea7b/crates/bevy_render/src/view/window.rs#L139-L140 See https://github.com/bevyengine/bevy/pull/1416

  2. I made a frame limiter app that adds a frame limiting system in the renderer sub app, in the final cleanup stage. You can check it out here: https://github.com/aevyrie/bevy_latency

It works like this:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
 4ms    Send to GPU, present frame (input is 4ms out of date)
New!    Run a stopwatch in a system in RenderStage::Cleanup to time how long a frame takes (not including sleep)
New!    Sleep the thread for the predicted amount of time, with some safety factor added to prevent frame drops, 
            in case it actually takes longer to render the frame than predicted.
16ms ----------------------

Here's a trace to better visualize, notice the large blue bar with the label "framerate limiter" image

I don't have any empirical measurements, but so far it seems promising.

Mailbox vsync, framerate limiter enabled

https://user-images.githubusercontent.com/2632925/146698956-097151fd-0579-4c8d-8774-5eee09e6bc6d.mp4

Mailbox vsync, framerate limiter disabled:

https://user-images.githubusercontent.com/2632925/146698763-3dd05b4b-5eec-49e0-87ce-a75343de5166.mp4

With the framerate limiter, the 3d cursor feels perceptively less sluggish, but it's still not as good as with vsync off. I tested without vsync both with and without the framerate limiter, however I was still seeing some tearing with the framerate limited to ~60, though on the plus side I did see significantly less GPU/CPU usage.

aevyrie avatar Dec 20 '21 01:12 aevyrie

I modified the prototype to add a second sleep system that caps the frame rate to exactly what you specify.

0ms -----------------------
            Estimate how long the next frame will take, minus a small margin to give us space if it takes longer
            Sleep for this duration. (Forward estimation)
8ms         Start of event loop. Get input. Do stuff.
11.8ms      See how close our estimate was to the requested frame time, sleep if required to get the frametime just right
12ms        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

image

Red annotation: forward estimation. Blue annotation: precise frametime limiter accounting for error and margin in the last frame's forward estimation.

I'm seeing some really awesome results, the 3d cursor is basically glued to the mouse cursor:

https://user-images.githubusercontent.com/2632925/146732106-864b1afe-8fd3-48de-a467-9220fdb9117b.mp4

In addition, I can bring my safety margin pretty low without frame drops - on the order of 100μs. I've bumped it up to 500μm to reduce chances of frame drops, but at the cost of only 400μs more input lag.

Edit: this wasn't possible without spin_sleep to get precise sleep times. Thanks for the suggestion @cwfitzgerald!

aevyrie avatar Dec 20 '21 08:12 aevyrie

Some other neat byproducts of this, it's now really easy to framelimit to an arbitrarily low FPS for power use or other reasons. However, because our input -> render latency is constant (3.5ms in my case), the game/app still feels really responsive at low framerates! Here's the demo locked to only 20fps, yet the 3d cursor still doesn't lag behind the OS cursor very much:

https://user-images.githubusercontent.com/2632925/146747108-8ab3c127-61b1-45cd-b46d-4da3dd0216c5.mp4

It doesn't feel laggy or jello-y, because the motion-to-photon time is still low, instead it only feels choppy because it doesn't update very frequently.

This brings up an interesting idea for system scheduling too. If a game or application is sensitive to input responsiveness, but needs a large frametime budget, they could schedule everything compute intensive after the render stage, instead of between input and the render stage. This means those changes would take up to a full frametime to display, but you now have the ability to only put critical things (like transforming objects in the world based on user input) in the pre-render schedule.

aevyrie avatar Dec 20 '21 09:12 aevyrie

I posted this in discussions, but it's also relevant for this issue... Here is my quick-and-dirty fix based on #6503 (pipelined rendering) which enables multiple app/input updates per single rendered frame. Thus processing input immediately, even when using VSync. It probably breaks all sorts of stuff that implicitly assumes one app frame equals to one render frame. Also, probably doesn't work on all archs - I tested only in Linux where it works fine.

https://github.com/dafteran4/bevy/tree/multi-app-step-while-rendering

dafteran4 avatar Nov 23 '22 17:11 dafteran4

What's the current progress of improving input lag? I have my own ideas on how to improve input timings, but it seems like the problem is MUCH worse then that. All the Bevy programs I've tested have absolutely terrible input lag, taking many frames to process inputs.

Tests: I used carnac (and recording) to see how long it takes for bevy projects to process inputs (60 fps, release mode, Windows 10, Rust nightly 0.70). For my project (bevy 0.10), it consistently updated 4 or 5 frames (~75 ms) after carnac did For vx_bevy (bevy 0.9.1), it consistently updated 3 or 4 frames (~60 ms) after carnac did

This might not sound bad, but it feels absolutely horrible and I can't even consider using Bevy unless this is fixed

What42Pizza avatar Mar 11 '23 04:03 What42Pizza

Have you experimented with bevy_framepace? That was designed in large part to reduce input lag in sensitive GUI applications and in my experience it helps quite a bit.

We're looking to upstream that, and to expose and use UI time stamps as well.

alice-i-cecile avatar Mar 11 '23 15:03 alice-i-cecile

How exactly do you use that with Bevy 0.10? The version on crates.io is for bevy 0.9 and I can't figure out how to use cargo workspaces to use the version in the pull request

What42Pizza avatar Mar 11 '23 19:03 What42Pizza

Bevy 0.10 was released on the 6th of this month. I can see it just fine on crates.io.

bjorn3 avatar Mar 11 '23 20:03 bjorn3

The question was how do you use this: https://github.com/aevyrie/bevy_framepace/pull/32.

You should be able to specify Alice's fork like so: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories

SUPERCILEX avatar Mar 11 '23 20:03 SUPERCILEX

That worked, thanks!

What42Pizza avatar Mar 11 '23 21:03 What42Pizza

This might not sound bad, but it feels absolutely horrible and I can't even consider using Bevy unless this is fixed

Using bevy_framepace, you should be getting <1 frame of latency unless you've created some sort of system order bug. It can be used with and without Vsync, and should work with all PresentModes.

The latency you experience depends on the PresentMode you are using. Fifo will accumulate frames (I think it caps out at 3?). Mailbox should give you near-perfect results, though it isn't supported on all platforms. I've spent quite a lot of time on this issue, and I'm pretty happy with the results I've been able to achieve, there is nothing inherently wrong with Bevy.

The other thing worth mentioning is the new parallel pipelined renderer will add latency if enabled, as the CPU simulation + GPU render end-to-end can take longer than a single frame.

aevyrie avatar Mar 24 '23 09:03 aevyrie

Thank's a lot for this help !

With frame_pace, I was able to greatly reduce the latency of my piano app, it was unusable before and now it works !

rambip avatar Feb 17 '24 17:02 rambip

While I'm not professional game developer and just playing with bevy, such behaviour looks weird and broken, no matter what reason causes this. There should be a way to get rid of input lag without disabling vsync or adding some third party crate.

PresentMode::AutoNoVsync Monosnap screencast 2024-03-23 14-17-43

PresentMode::AutoVsync or bevy_framepace Monosnap screencast 2024-03-23 14-40-27

morr avatar Mar 23 '24 11:03 morr