OpenComputers
OpenComputers copied to clipboard
Optimized Render code for TextBuffers
This PR replaces the existing infrastructure for TextBuffer Rendering with a texture sheet.
Benefits:
- Significantly less OpenGL calls
- Dramatic increase in FPS on certain GPUS (see below for stats)
- Only re-draws the parts of the screen that need it
- Viewing the screen both in world and in window draw a single quad each
TODO:
- Changing screen resolution does not stretch as expected. I would argue that this is a feature, but it could be considered a bug
Stress Test Program (Courtesy of Ristelle):
local component = require("component")
local gpu = component.gpu
local r = {"00","33","66","99","cc","ff"}
local g = {"00","24","49","6d","92","b6","db","ff"}
local b = {"00","40","80","c0","ff"}
local x,y = gpu.getResolution()
local rr = 1
local rg = 1
local rb = 1
while true do
rr = rr+ 1
if rr == 6 then
rr = 1
rg = rg + 1
end
if rg == 8 then
rg = 1
rb = rb +1
end
if rb == 5 then
rb = 1
end
local rgb = string.format("0x%s%s%s",r[rr],g[rg],b[rb])
gpu.setBackground(tonumber(rgb))
print("Hello Cam asjdkfasdhjkhfakh For example, this.12839054812908902!")
os.sleep(0)
end
Test Results: AMD R9 380: 5FPS - 60FPS (linux vsync locked) NVIDIA GTX 650: 80FPS - 80FPS INTEL Celeron 847 HD Graphics (1.1GHz): 10FPS vs 30FPS more coming soon
So if I understand correctly, this replaces display lists with textures? In which case, cool, this is something I would have liked to try out (back when I was more active on OC dev :P) but never got around to, in part because I wasn't certain it'd actually be feasible, GPU memory-wise.
And that's one of the two points where you could make me a lot more at ease with merging this: would you mind running a comparison of the new rendering code and the old one with a superflat (less noise) scene that has a whole bunch (*) of individual (actively rendering) T3 screens in it, and see how GPU memory usage reads? GPU-Z should probably work, otherwise there's an OpenGL callback to get memory usage IIRC. I have no idea how memory-(in)efficient display lists are, so this might even be better with textures, but still, I'd like to see actual numbers.
Second, I'd very much like for the old method to be available via an option, still. Just so we can tell people in case they should run into any kind of issue to use that until it can get fixed. Once it's proven itself, we can kill the old codepath (say in half a year/with MC 1.13 or so).
I'll be gone for most of the week, but I expect to be able to look at the code more in-depth next weekend.
However this turns out, thanks in advance for taking a shot at this!
(*) Very specific, I know. I'd go with 32 or so, just to be able to see a significant difference and avoid noise messing up results.
I'll like to mentions some more details as well.
Some users may experience poorer performances
Cam Test:

Main Branch:

Same Exact Setting for both.
Oh wow I can upload the Test world... Sure!
Testworld.zip
First thing I noticed, this does not fix issue #779. That's more of a different thing focused on the Lua side, this PR is more about rendering efficiency (and for rendering it's good! 👍)
Secondly, the stretching would be fine if it only stretches by integer multiples, which it doesn't when the size it is stretched to wouldn't otherwise fit, leading to it looking wrong when a high resolution is set.
A big change I'd like to introduce with this is making linear scaling the min filter for screens - which was impossible in the old system, but would definitely help performance.
I have my own concerns over VRAM usage at scale, though.
if we've added this to the our 1.7.3 milestone, we need to have the PR author finish the concerns of the review.
@cam72cam before we can merge this, we would need all of the following addressed
-
how GPU memory usage reads comparing 32 screen with superflat scenes? GPU-Z should probably work.
-
keep the old method avaialble via config until it's proven
-
Ristelle demonstrated on nvidia 650 we drop from 91fps to 52fps? That seems bad, doesn't it?
-
asie would like to see making linear scaling the min filter for screens.
-
asie is concerned about vram usage at scale. @asiekierka: what scale value matters?
- I am on linux and don't have GPU-Z
- I'll try to see if I can figure out a way to do this
- That's a fair comparison, a lot of the performance of this code depends on the GPU you are using. Intel and AMD seem to work better with my PR. NVIDIA is a toss up IIRC 4: I'll see what I can do 5: aise?
For what it's worth, for NVIDIA GPUs on Linux (with the official drivers), a very useful tool is running nvidia-smi -l to see basically all relevant data at a glance. I know nothing about other GPUs, or about vendor-independent ways to view this data, unfortunately.
@cam72cam I can confirm there is a drop in FPS to half on integrated graphics. This will affect a good number of users, unfortunately This speedup, however, would be a welcome improvement. The right thing going forward is to provide a configuration option to enable this.
We are very close to releasing our 1.7.3 patch. I choice on this PR is three-fold
- I'm removing the 1.7.3 miletone from this PR
- I think this change in behavior should be configurable, but off by default. We should continue using the old code by default.
- I think it is okay to only upgrade our 1.12 branch with this feature. I'm not interested in making these improvements to the lower versions
Sorry I have not been able to work on this PR much. Work has kept me busy and all my current free time goes into IR.
@cam72cam Any updates on this?