Alexander Monakov comments

Results 78 comments of


                                            Alexander Monakov

trafficstars

store forwarding does not have fixed latency

Ooh, and I can stick an ALU op in the dependency chain without increasing overall latency, making best-case forwarding latency 3 cycles on SNB and IVB: ```nasm loop: mov eax,...

store forwarding does not have fixed latency

> So this loop runs in 4 cycles per iteration? Amazingly, yes!

store forwarding does not have fixed latency

Indeed, this runs at 3 cycles per iteration too. *Perfection.* ```nasm loop: mov [rsp], rdi imul rsp, 1 mov rdi, [rsp] dec ecx jnz loop ```

Writing traces to anything but tmpfs causes hickups in gameplay

@blackout24, it's odd that it blocks that badly, but the issue seems tied to disk writeback. Can you try tuning page cache writeback according to http://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ ?

Writing traces to anything but tmpfs causes hickups in gameplay

How much RAM do you have? Try making background writeback more eager: `sudo sysctl vm.dirty_background_bytes=$[2**21]`

vogl_entrypoints.cpp memory hog in 32-bit mode

In steps 3.i and 3.ii most of the optimizations are _not enabled_, since you're not passing -O on the command line. This is a frequent "paper cut" with GCC command...

vogl_entrypoints.cpp memory hog in 32-bit mode

`-fvisibility=default` constraints some optimizations. Did a few tests with 4.8.1 at -O1. RTL DSE and postreload cse seem to be responsible for the huge memory consumption, `-fno-dse -fdbg-cnt=postreload_cse:0` is a...

vogl_entrypoints.cpp memory hog in 32-bit mode

Trunk still needs -fno-dse, but postreload cse seems to be improved a bit; still consumes a lot of memory, but does not explode like on 4.8.1.