Fossilize icon indicating copy to clipboard operation
Fossilize copied to clipboard

Fossilize influenced by I/O load on unrelated drives

Open SimplyCorbett opened this issue 2 years ago • 4 comments

Your system information

  • Steam client version (build number or date): latest (Jan 16 2022)
  • Distribution (e.g. Ubuntu): Gentoo
  • Opted into Steam client beta?: [Yes/No] No
  • Have you checked for system updates?: [Yes/No] Yes

Hardware: 3900x 64GB RAM Rest described below.

When using the following setup with an I/O load fossilize will go down from maxing my processor to using only one core at 50% to having no CPU usage at all.

Setup: BTRFS RAID0 (2x512GB /) BTRFS RAID0 (2x10TB /mnt/RAID) BTRFS Single drive (1x3TB /mnt/RAID/3TB) Have steam install and download games in /home/user

How to replicate: Start processing vulkan shaders; Start transferring files from /mnt/RAID/3TB to /mnt/RAID (this shouldn't cause any problems as the I/O is not on the main drive).

Fossilize now vanishes from system and stops working until the transfer is completed, cancelled or paused.

Reason for bug:

The I/O is not on the main drive that steam is installed on, so it shouldn't affect fossilize. On my 3900x the load average is 3. Plenty of CPU for fossilize to run.

Pausing the transfer results in the CPU suddenly hitting 100% with fossilize using all cores after a second or two.

SimplyCorbett avatar Feb 03 '22 15:02 SimplyCorbett

This has mainly been discussed here: https://github.com/ValveSoftware/Fossilize/issues/99

The behavior you observe is due to the introduction of watching IO PSI in the kernel which is global across all drives. You could try turning the kernel PSI feature off: Add psi=0 to your kernel command line (https://facebookmicrosites.github.io/psi/docs/overview). It may be possible that your kernel defaults to on.

This change actually included PSI support which fixed desktop stalls for most users: https://github.com/ValveSoftware/Fossilize/commit/200b19c319e2872415d74b5d3479e1624d748bc6

It was introduced because shader compilation is not actually a CPU-only thing, it also involves a lot of inefficient IO (it's actually not much IO but it is pretty random in the driver caches and thus inefficient, especially on btrfs).

kakra avatar Feb 03 '22 19:02 kakra

Yes, Fossilize has to go out of its way to not make other stuff go slow on the system, especially anything related to IO since we can quickly swarm IO caches when 10+ threads hammer out shader caches with non-ideal access patterns.

HansKristian-Work avatar Feb 03 '22 22:02 HansKristian-Work

@HansKristian-Work It may be possible that the dirty pages watcher is a bit too aggressive: If copying large files, dirty data is expected. Maybe it should watch PSI only, and if there is no PSI feature available, it should fall back to a less aggressive dirty pages watcher? Or just make that less aggressive in general?

Personally, I don't care if it pauses when running in the background. It's probably the foreground mode when people would care about it. That said, it works perfectly fine for me in background mode since that change back then - no issues whatsoever. Not sure if the Steam client already uses the control channel to actually switch to aggressive mode when running in foreground. But then again, when such a control option is implemented but Steam doesn't use it, this is probably a Steam bug, not a Fossilize bug.

kakra avatar Feb 04 '22 08:02 kakra

since we can quickly swarm IO caches when 10+ threads hammer out shader caches with non-ideal access patterns

That's actually the point here even when copying data on another drive... Fossilize needs to get out of the way of other software using the page cache - no matter if that's a different device.

kakra avatar Feb 04 '22 08:02 kakra