RobustToolbox icon indicating copy to clipboard operation
RobustToolbox copied to clipboard

Consider strategy for GPU device loss.

Open PJB3005 opened this issue 3 years ago • 6 comments

This can happen on ANGLE and Vulkan for various reasons but we might want to handle it. TL;DR is that we'll need to be able to re-load everything onto the GPU from scratch, at any time.

Biggest consideration is texture reloading. Most textures can be reloaded from disk, but anything uploaded by content or, say, the font system, may not. Do we just cache these in host RAM?

PJB3005 avatar Aug 15 '21 00:08 PJB3005

As someone who has no clue what this means, are you seriously prepping for someone unplugging their card while the game is running? And if so, why should we even support this?

PaulRitter avatar Aug 17 '21 07:08 PaulRitter

As someone who has no clue what this means, are you seriously prepping for someone unplugging their card while the game is running? And if so, why should we even support this?

AFAIK, this can happen for multiple reasons other than that such as:

  • Graphics driver being updated
  • Going from an from igpu to dgpu
  • GPU just stops responding and has to be reset

I think one of our contributors ran into this problem while just playing SS14 normally

gradientvera avatar Aug 17 '21 07:08 gradientvera

But yeah SS14 needs to support GPU hotswapping.

ike709 avatar Aug 17 '21 15:08 ike709

As someone who has no clue what this means, are you seriously prepping for someone unplugging their card while the game is running? And if so, why should we even support this?

You know that fancy Microsoft Surface where if you join the keyboard to the tablet it starts using the nvidia GPU? There you go.

PJB3005 avatar Aug 17 '21 15:08 PJB3005

I think one of our contributors ran into this problem while just playing SS14 normally

Shadow was having a problem where his GPU was being reset while resizing the window for some reason. Still sounds like a driver bug but oh well.

PJB3005 avatar Aug 17 '21 15:08 PJB3005

Oh yeah, I've had problems on Linux where unsuspending the system causes GPU memory to be cleared. In this case the driver would report a GPU device loss and require us to re-initialize everything instead, buuut desktop OpenGL traditionally had no concept of this so the GL driver is forced to keep a copy of every texture/buffer in host RAM, which is incredibly wasteful. It also can't do this for any GPU-generated textures (anything rendered on the GPU to a framebuffer) which means that many programs have broken screens upon unsuspend.

My understanding is that if we handle GPU device loss ourselves (via GL_ARB_robustness and platform-specific extensions) then the GL driver does not have to keep a host-memory copy and this would reduce memory usage drastically (we can reload textures from disk in this rare scenario instead of keeping them in memory).

A problem is that GLFW requires you to re-create the whole window. This is mildly annoying to say the least.

PJB3005 avatar Aug 17 '21 16:08 PJB3005