SteamVR-for-Linux
                                
                                 SteamVR-for-Linux copied to clipboard
                                
                                    SteamVR-for-Linux copied to clipboard
                            
                            
                            
                        [BUG] [Regression] Overlay Wobble/Jitter/Artifacting introduced in 1.15.X
Description
Starting with SteamVR 1.15.X, overlays (via IVROverlay from a game client, and the "system" overlay displayed when you hit the "system" button on the index controllers), wobble, as if they are attached to the HMD with a rubber band. If you move around enough, there will also be significant artifacting ocurring in the overlays (though it's a little harder to induce with the system overlays compared to the IVROverlay instances created by ACC).
This only occurs when asynchronous reprojection is enabled.
Reproduction Steps
Steps to reproduce the behavior:
- Launch SteamVR 1.15.2
- Open system overlay
- Move your head around
Expected Behavior
The overlay should stay static in space, and not have artifacts, as it did in SteamVR 1.14.16.
System Information
- Distribution: Arch Linux (amd-staging-drm-nextkernel -f24cd554aacf4e989a0796c7d30d758115512732rebased off ofv5.9-rc6from Linus' tree with rebased futex patches from mcoffin/[email protected], but this behavior is observed on stable channel kernels as well)
- Exact kernel (though likely non-factor since reproducible on stable channel as well) - mcoffin/linux@7d4ad53110a23391a790e8baf22dd48f8a1f0602
- SteamVR version: 1.15.2 (working: 1.14.16)
- Steam client version: Built: Sep 3 2020 at 21:18:20
- Opted into Steam client beta?: No
- Graphics driver version:  Mesa 20.2.0 (git-663d464366) (ACO),AMD Radeon RX 5700 XT (NAVI10, DRM 3.40.0, 5.9.0-rc6-1-amd-staging-drm-next-git-00457-g3637df487bdb, LLVM 11.0.0). Mesa versions down to 20.1.4 were also tried, with no change in behavior. Regression still occurs between 1.14.X and 1.15.X.
- Gist for SteamVR System Information: https://gist.github.com/mcoffin/bc4460030a4414e8929c78e252a80766
- CPU: AMD Threadripper 3960X
Additional Context
A cpu profile with sysprof shows a significant amount of time being spent in clock_gettime compared to "working" versions. (~10% of samples for ACC with it open!). This could indicate a busy-wait somewhere that's misbehaving?
Mesa 20.2.0 released a whole bunch of RADV features for timeline syncobj's, and I chased down that path for a while, but observing the same behavior with previous mesa releases ruled the timing of this release out as the issue.
Under normal operation, I'm running a pretty hefty GPU overclock, and setting the min frequency to quite close to the max to achieve livable performance, but disabling this (clean boot with no writing to sysfs), did not affect the observed behavior in any way.
Screenshots
Unfortunately, overlays are only displayed in the HMD for me, so I cannot capture this behavior, though I can look in to alternative solutions if upstream developers cannot readily reproduce the issue.
In-progress debugging
- [ ] Hook up a 2070 super from my GF's computer to see if the problem is isolated to amdgpu/mesainteractions with SteamVR
- [x] Downgrade mesa pre-timeline-syncobj implementation (result: no change)
- [x] Use stable kernels (result: no change)
New note: This only occurs when asynchronous reprojection is enabled.
Same here.
System Information (please complete the following information):
- Distribution: Manjaro KDE
- SteamVR version: 1.15.2
- Steam client version: 1602115886
- Opted into Steam client beta?: yes
- Graphics driver version: Mesa 20.1.8 (LLVM 10.0.1)
- SteamVR System Information: SteamVR-2020-10-09-PM_03_08_42.txt
can reproduce:
Distribution: Arch Linux (5.9.2-zen1-1-zen) SteamVR version: tested 1.15.2 - 1.15.5 (worked in 1.14.16) Steam client version: Built: Oct 28 2020 at 23:35:02 Opted into Steam client beta?: Yes Graphics driver version: Mesa 20.2.1 on AMD Radeon RX 5700 XT (NAVI10, DRM 3.39.0, 5.9.2-zen1-1-zen, LLVM 10.0.1) Gist for System Information: https://gist.github.com/WebFreak001/8fe9ab2ad916efc11a42c483fb6b44b5 SteamVR System Information: https://gist.github.com/WebFreak001/35b7e824996ec98d763bb6978a74f2c7 CPU: AMD Ryzen 7 1700X
I have this problem too and now that the beta has been pushed into the main release branch it can no longer be avoided by opting out of the beta.
Distribution Arch Linux Linux 5.9.8-zen1-1-zen #1 ZEN SMP PREEMPT Tue, 10 Nov 2020 22:44:06 +0000 x86_64 GNU/Linux mesa 20.2.2-2 AMD 5700XT AMD 3700X
Edit: the problem also effects xrdesktop windows.
Can reproduce
SteamVR version: 1.15.10 Distribution: Arch Linux Linux 5.9.8-zen1-1-zen #1 ZEN SMP PREEMPT Tue, 10 Nov 2020 22:44:06 +0000 x86_64 GNU/Linux Mesa 21.0.0-devel AMD Radeon RX 580 Series (POLARIS10, DRM 3.39.0, 5.9.8-zen1-1-zen, LLVM 11.0.0) AMD Ryzen 5 3600
There is a linux_1.14 branch available in SteamVR as a temporary solution to this problem.
Can reproduce, but not just the overlay. Timing-sensitive games like Beat Saber felt significantly more choppy, which the fps graph confirmed.
SteamVR version 1.15.10 Distribution: Kubuntu 20.04 Linux 5.4.0-54-generic mesa 20.3.0-rc2 AMD Radeon RX 5700 AMD Ryzen 5 3600
There is a linux_1.14 branch available in SteamVR as a temporary solution to this problem.
@lostgoat Could I ask what are the details for the system that Valve uses for testing SteamVR on Linux? It has been awhile since there has been a Linux update, specifically for performance. Are there things specifically preventing Valve from getting SteamVR working with good performance (close to / matching Windows) on Linux?
I'm hitting something that may or may not be related.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/4044
@anadon what you are experiencing might rather be related to #377 (which for me happened because of different vulkan driver)
check the solutions I posted there to see if that fixes it for you
@WebFreak001 Nope, different pattern in the specific details for visual corruption, and much different characteristics around how problems emerge. I'm guessing for yours there is an implementation error in Vulkan in your driver. I think for mine it has to do with the memory allocater and/or use before initialization and/or erroneous overwrite of allocated memory. My preview also differs from what I see, where your steam preview matches what you see. I've already fixed a conflicting drivers issues that was amdgpu-PRO related a few weeks ago, which I needed to do to get VR working at all.
Would like to confirm that doing this fixes the 'wobbly'-ness as stated initially by @mcoffin
SteamVR version 1.15.19 Distribution: Arch Linux Kernel 5.10.8 Mesa 20.3.3 AMD Radeon RX 5700 XT AMD Ryzen 5 2600
Has been stated at the top of the thread. Unfortunately, we are best of sticking around with 1.14 until Valve fixes this.
Has been stated at the top of the thread. Unfortunately, we are best of sticking around with 1.14 until Valve fixes this.
Yes, though while mcoffin wrote what caused it, it wasn't explained how to turn it off. The source that I found and linked is also bit old and I was not certain that it would still work.
Because it has not been mentioned: This affects OpenXR quad layers too.
I get this issue too. Initially I created a separate issue for it, but it turns out it's the same as this one.
- Distribution: Gentoo Linux, kernel 5.9.14-gentoo with the newest Mesa drivers pulled from Git and LXDE.
- SteamVR version: 1.16.6
- Steam client version: Dec 20 2020, 23:07:25
- Opted into Steam client beta?: For SteamVR, yes. If you mean Steam itself, no.
- Graphics driver version:  [run nvidia-settingsorvulkaninfo | grep driverInfo:Mesa 21.0.0-devel (git-ecac89b732) (ACO)
- Gist for SteamVR System Information: https://gist.github.com/happysmash27/b0365a9032d5a2ca2c68517c990394b0
I also recorded the issue through the lenses of my HMD: https://happysmash27.me/Upload/Screenshots/Videos/Bugs/SteamVR/VID_20210204_050848.webm
Just to chime in on this one guys, in my experience of only about 6months and ~1300hrs of usage, I've seen that the asynchronous reprojection implementation in SteamVR on Linux is mostly unusable, especially when compared to the stability of the performance achievable with some manual resolution tuning and just keeping it disabled.
While I originally disabled it to fix this issues, I've seen many other performance benefits to just keeping it off, and I'd recommend other performance-sensitive folks who's hardware isn't going to fall behind do the same for now. There's little reason to stay on 1.14 just to save the ability to use a feature that doesn't quite cut it to begin with.
Just to chime in on this one guys, in my experience of only about 6months and ~1300hrs of usage, I've seen that the asynchronous reprojection implementation in SteamVR on Linux is mostly unusable, especially when compared to the stability of the performance achievable with some manual resolution tuning and just keeping it disabled.
While I originally disabled it to fix this issues, I've seen many other performance benefits to just keeping it off, and I'd recommend other performance-sensitive folks who's hardware isn't going to fall behind do the same for now. There's little reason to stay on
1.14just to save the ability to use a feature that doesn't quite cut it to begin with.
I'll try that out then, since my hardware is on the low end of VR users (RX 580 8GB, R5 1600X). Thank you for telling us all!
Async Reprojection causes problems because of #269. If you have low end hardware then you're likely gonna be throttling most of time anyways, then disabling it isn't a good idea (I can't have it off in Boneworks or Blade&Sorcery for example, with a 5700XT. They just stutter too much without it). For stuff like Beat Saber that always runs at max refresh rate though it does improve the experience a good bit.
I too experience a lot of headache inducing stuttering with a 5700XT if async is disabled. I'm not so keen to reduce the resolution because it makes it difficult to read text. Are there other tweaks that can make disabling async usable or is reducing resolution the only way?
I'm currently running a 6900XT, but I spent the last year on the 5700XT @duckbytes and @Zamundaaa. I found a few tricks that DRASTICALLY improved the experience.
Even though it may seem like you're throttling, sometimes, what's happening is that during the downtime between frames, the 5700XT will downclock to a lower power state, causing just enough lag on the next frame to temporarily trigger throttling before it clocks back up, does a few frames quickly, and the whole process repeats.
By setting the minimum sclk speed to something relatively high, I was able to completely stabilize my experience, even in ACC, which is a notoriously GPU-intensive game, especially in VR. I was able to run at 90Hz with no issue.
If you want to give it a whirl, here were the settings I was using all year. You may have to tweak some things due to silicon quality and cooler quality (I had a custom liquid loop, so lots of thermal mass to absorb spikey power consumption).
sclk_min: 1950
sclk_max: 2210
mclk_min: 700
mclk_max: 903 # my memory sucks. you might be able to do better than this
power_limit: 300000000 # This means 300W - at 300W often other limits are reached before the card hits this limit, which keeps the clocks as stable as they can be so long as your cooler can handle it
voltage_curve:
  - 750mV @ 800MHz
  - 912mV @ 1450MHz
  - 1230mV @ 2210MHz
Note that the voltage curve can still have points on the curve below the actual minimum sclk frequency, which makes it easier to adapt.
In some titles, I had similar problems on the CPU side, but I'm not as familiar as I didn't write the overclocking support on that side, so for those titles I just settled for setting max all-core frequencies, which worked out... okayish on my 3960X
sudo cpupower frequency-set -g performance
Some tools I made throughout this process (in addition to writing the reclocking support for amdgpu):
- fanctl- powerful rule/curve based userspace fan controller. Think of it as an actually usable replacement for- fancontrolfrom- lm_sensors. Example config that has been rock solid for me:- fanctl.yml.
- amdgpu-smu-od- tool I use to automatically set GPU configuration from YAML files instead of having to do it manually with- sysfsall the time. I'll attach an example config file.- superclock.yml.
If you still have issues, please reach out. I spent tons of time on this over the last year, and good performance is possible. The biggest gains I saw were from drastically increasing the minimum shader clock (to prevent the downclocking), so I'd start there if I were you.
Which would be a good way to monitor my GPU clock to see if I may also be having a similar issue on my RX 480?
Never mind; radeon-profile (https://github.com/marazmista/radeon-profile) seems to work pretty well, and was in my distribution's repositories.
Thanks @mcoffin for the ideas. I want to try out these configurations but I'm hesitant because I don't know enough about overclocking to know if these are suitable for my GPU or aren't going to cause damage. Should I look up frequencies for my particular model or reduce some of the values to account for only having air cooling? The specific card I have is MSI Radeon RX 5700 XT MECH OC 8GB.
I did find some options in radeon-profile that let me fix the frequency under Overclock > Manual frequency control. I can set that to only use the highest frequency (2100, 875 on there by default) by unchecking the other options. Is that more or less doing the same thing and would help with the downclocking bug you mention? Or is there more to it?
Thanks again!
I want to try out these configurations but I'm hesitant because I don't know enough about overclocking to know if these are suitable for my GPU or aren't going to cause damage
- You're (probably) not going to damage your GPU by overclocking, you'd likely just make it crash. The potential for damage only really comes from running really hot for extended periods of time. As long as you're monitoring Tjunction (and with that MSI card, which has a known memory cooling issue, Tmem) and making sure they're somewhat sane, you should be fine.
- You don't have to push the clocks as far as I did to solve the issue. Keeping the default setting for your card for the the slckmax and setting the minsclkvalue to something likesclk_max * 0.8should cut it.
I don't use radeon-profile as the differences between the way various generations of cards manage power states and clocks is so widely variant. The biggest changes were from the 3/4/500 cards to the 5000-series (polaris -> navi).
Here is a tool I just now wrote for you @duckbytes. If you run it, it won't touch the voltages or anything, just set the minimum sclk to a percentage of the maximum sclk value (default: 85%). the -p option can be used to also set a power limit (in watts). Check the readme (or run amd-vr-clocks.sh -h) for more info, but here's an example usage. The -r flag tells the script to reset the card's clock speeds to whatever was default in the card's VBIOS at boot time.
This script should be pretty safe as it won't go beyond what the card is already allowed to boost to (except for increasing power limit if you choose to do so). If you're gonna set a higher power limit, just make sure you keep an eye on the temperatures when you do to make sure it's sane. You don't really want Tjunction to be over 90C that often (though I think AMD says that it's fine to run the 5700XT with Tjmax all the way up to ~98C). With my liquid cooling and a 300W power limit, my Tjunction stays around 73C, but ymmv with the default air cooler (especially on that MSI card).
To summarize
- try it without the -pflag. See how that goes. This is super duper uber safe.
- Try it with the -pflag to set a higher power limit, but keep an eye on temps. power limit will still obey card VBIOS OD limits, so this is pretty safe so long as you take notice if it's running insanely hot (unlikely)
- You can also override the card's default maximum power limit by using custom powerplay tables to overwrite the VBIOS-provided powerplay table. This is advanced, and this is where you get dangerous, so I didn't include instructions for this, but if you want to take the plunge, then - amdgpu-smu-od + upp + just the registry files from the igor's lab link in this article to get the base64 of some modded powerplay tables (or create one yourself with uppafter modifying your dumped pptable from/sys/class/drm/cardX/device/pp_table).
# set min sclk to 85% of max sclk, and 210W power limit
./amd-vr-clocks.sh -p 210
# reset clocks to whatever is in the card's VBIOS, and set power limit to 180W
./amd-vr-clocks.sh -r -p 180
# set min sclk to 85% of max sclk on device at /sys/class/drm/card1/device, without touching power limit
./amd-vr-clocks.sh -d /sys/class/drm/card1/device
# reset clocks on device at /sys/class/drm/card1/device without touching power limit
./amd-vr-clocks.sh -d /sys/class/drm/card1/device -r
EDIT: forgot the link - mcoffin/amd-vr-clocks
@duckbytes let me know how that goes for you, eh?
Usage
Usage:  [-d /path/to/device] [-i DEVICE_INDEX] [-v] [-y] [-p POWER_LIMIT_IN_WATTS] [-s SCALE_PERCENT] [-r] [-h]
Flags:
	-d - device_path (default: /sys/class/drm/card0/device)
	-i - device index
	-v - increase output verbosity
	-y - do not ask for confirmation
	-p - power limit to set (default: none)
	-s - percentage to scale sclk_max when deriving sclk_min (default: 85)
	-r - reset clocks to original settings instead of setting VR mode
	-h - print this help text and exit
@mcoffin thanks so much for making the tool for us. Really good of you.
It works great for me after adding amdgpu.ppfeaturemask=0xffffffff to my boot options.
I tried out some worlds in VRChat. It is really decently smooth at the moment. Heavier worlds are still too much without async reprojection, but I can comfortably be in more chilled worlds. I can use xrdesktop again now too since it was failing with SteamVR 1.14 but not the latest version.
Thanks again for the help.
I'm on an Nvidia GTX 1080. I figured that having Ferals Gamemode being active/enabled while running SteamVR made a lot of difference on my Fedora system here. No idea though what the effects are when running an AMD card, but you might give it a shot too (if you haven't already).
it seems like in latest SteamVR beta (1.17.6) and stable (1.16.10) ~~there is no async reprojection setting anymore, however~~ the wobble issue still persists.
For workaround see #227
You don't really want
Tjunctionto be over 90C that often (though I think AMD says that it's fine to run the 5700XT withTjmaxall the way up to ~98C). With my liquid cooling and a 300W power limit, myTjunctionstays around 73C, but ymmv with the default air cooler (especially on that MSI card).
@mcoffin just for your information, I have a MSI 5700XT GAMING X and junction temperature is around 100°C during most games and in some cases sits 'comfortably' at 105°C. The maximum I've seen is 112°C, though only for a second or so. AMD themselves have stated that junction sensor reporting 110°C is "within specs" in this blogpost.
@WebFreak001 yeah the setting has been gone for a while, I think this is the fourth time that someone, including myself, linked that workaround in here. I wish Valve would just fix this wobbly-mess so that this issue can be closed.
@mcoffin just for your information, I have a MSI 5700XT GAMING X and junction temperature is around 100°C during most games and in some cases sits 'comfortably' at 105°C. The maximum I've seen is 112°C, though only for a second or so. AMD themselves have stated that junction sensor reporting 110°C is "within specs" in this blogpost.
@steffenWi - Check out the fan curve controller I wrote to get those under control! - fanctl. Been working well for me on my 6900XT (not water cooled like my 5700XT) since I got it.