High CPU usage due to time spent in OpenGL driver
This may not be a cinder-vr specific thing, but it is very obvious when looking at frame timings on the SteamVR developer window.
First - my system: i7 6700, ATI R390 4Gb, 64Gb Win10.
I tried this on one of the samples - controllerIntermediate which doesn't do much in terms of rendering (i.e. nothing fancy, not too many items). This should be almost zero CPU beyond the initial setup, with minimal GPU.
The average frame breakdown for controllerIntermediate is:
CPU:
Idle ~4ms
Compositor ~0.2ms
Application (other) ~6ms <<< This should be ZERO!
Application (scene) ~1ms
Late Start: 0ms
GPU:
Idle ~7ms
Other ~0.2ms
Compositor ~1ms
Application (other) ~3ms
Application (scene) ~0ms
By comparison, Google Earth does:
CPU:
Idle ~8.5ms
Compositor ~0.2ms
Application (other) ~0ms
Application (scene) ~2.5ms
Late Start: 0ms
GPU:
Idle ~8ms
Other ~0.1ms
Compositor ~1ms
Application (other) ~3ms
Application (scene) ~0ms
And The Lab which uses DirectX:
CPU:
Idle ~9ms
Compositor ~0.2ms
Application (other) ~0ms
Application (scene) ~1.8ms
Late Start: 0ms
GPU:
Idle ~3ms
Other ~0.1ms
Compositor ~1ms
Application (other) ~7ms
Application (scene) ~0ms
CodeXL (http://gpuopen.com/compute-product/codexl/) profiles show that controllerIntermediate is spending most of the CPU time on my driver (DrvPresentBuffers) while Earth spends most of its CPU time on the app itself.
Even though both The Lab and Earth use way more of the GPU, their CPU usage is minimal.
Same goes for the OpenVR hello_vr sample which uses straight up OpenGL and SDL.
CPU:
Idle ~10ms
Compositor ~0.2ms
Application (other) ~0ms
Application (scene) ~0.8ms
Late Start: 0ms
GPU:
Idle ~7ms
Other ~0.1ms
Compositor ~1ms
Application (other) ~3ms
Application (scene) ~0ms
For the complexity of the samples, I'd expect them to be on par with hello_vr, perhaps with a 10% perf penalty for using a convenient library like Cinder. Right now it is about 700% penalty.
Given that all this time is going to the driver, my guess is that there's some poor configuration setting somewhere that is forcing the driver down a slow path for no good reason or something along the line, which is why I'm opening this hoping that someone more familiar with the interaction between OpenGL and Cinder has a better answer.
Thanks.
What does the hello_vr sample draw?
I thought you modeled Cinder-VR off that sample ...
It produces an 20x20x20 matrix of cubes that have a large 4k by 4k texture, while you're in the center of the volume.
The main loop calls a Render method for each eye render target FBO, then blits them to a resolved target for each eye and sends them to the compositor. The Render method for each eye is a call of glDrawArrays for most of the scene with a simple textured shader, folllowed by some GL_LINES for the controller axes followed by the same shaders you have in Cinder-VR for the controllers and base station rendering.
I see the same timing profile for all the samples. I went into the draw method of teleportationBasic and reduced it to just:
void TeleportBasicApp::draw()
{
if( mHmd ) {
mHmd->bind();
for( auto eye : mHmd->getEyes() ) {
mHmd->enableEye( eye );
//drawScene();
//mHmd->drawControllers( eye );
}
mHmd->unbind();
mHmd->submitFrame();
}
}
... so as to remove the rendering specific cost and just look at the cpu usage for the bootstrap code, and I get the same timing profile. This is a no-op, the headset mirror just draws the chaperon bounds on top of black, yet I see the same high CPU usage.
Sorry to nit pick, when you say the same - are you referring to your previous estimations or the hello_vr times?
The drawing code inside the Cinder samples are not efficient at all. When I wrote them, I didn't have time to cache the geometry to be GL friendly. The cost of construct and uploading buffers isn't incurred for each shape at every frame.
- All Cinder-VR samples have high CPU usage that matches this frame timing profile on my machine:
CPU:
Idle ~4ms
Compositor ~0.2ms
Application (other) ~6ms <<< This should be ZERO!
Application (scene) ~1ms
Late Start: 0ms
GPU:
Idle ~7ms
Other ~0.2ms
Compositor ~1ms
Application (other) ~3ms
Application (scene) ~0ms
As noted above, the 6ms (>50% of the allotted CPU time) should be closer to zero. This is the issue.
-
When I strip the TeleportBasicApp sample to no-op the draw calls (so that all we have is the boilerplate mechanism to submit frames to the compositor) I get the exact same high CPU usage timing profile as all the other Cinder-VR samples. This is to show that the problem lies on the boilerplate infrastructure and not any of the specific sample rendering.
-
As a point of comparison, I profiled some other apps that use OpenGL and OpenVR - Google Earth and the OpenVR hellovr_opengl sample. The CPU usage in those profiles (provided above) is much smaller than the stripped down boilerplate code above.