gem5 icon indicating copy to clipboard operation
gem5 copied to clipboard

cpu-o3: executing MFENCE in two parallel SMT threads causes hang

Open nmosier opened this issue 10 months ago • 3 comments

Describe the bug Executing an x86 mfence instruction in two sibling SMT threads on the O3 CPU causes the CPU to get stuck. Specifically, it appears that one of the mfences reaches the head of the ROB but never becomes ready to execute.

This simple assembly proof of concept (mfence.asm) triggers the bug:

        global _start

_start:
        mov dword [rsp], eax
        mfence

        mov eax, 60
        mov edi, 0
        syscall

Affects version develop @ c54039d

gem5 Modifications None

To Reproduce

  1. Build gem5: scons build/X86/gem5.opt
  2. Assemble the POC: nasm -felf64 -o mfence.o mfence.asm && ld -o mfence mfence.o
  3. Run it under SMT: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU --smt --caches -c './mfence;./mfence'

Terminal Output

$ timeout -s INT 10 ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches 
gem5 Simulator System.  https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version DEVELOP-FOR-24.0
gem5 compiled Apr 20 2024 20:38:46
gem5 started Apr 20 2024 21:17:07
gem5 executing on cafe-cet, pid 1395698
command line: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches

warn: The se.py script is deprecated. It will be removed in future releases of  gem5.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored
**** REAL SIMULATION ****
src/sim/simulate.cc:199: info: Entering event queue @ 0.  Starting simulation...
Exiting @ tick 15376263500 because user interrupt received

As you can see, the simulation times out after 10 seconds without both threads exiting.

Expected behavior Both threads should exit immediately.

Host Operating System Ubuntu 22.04

Host ISA X86

Compiler used GCC 11.4.0

Additional information None yet

nmosier avatar Apr 20 '24 21:04 nmosier

I think I've found the root cause of the issue. Thread 0 exits in the same cycle that thread 1's MFENCE instruction [sn:74] is sent from IEW to commit (via the toCommit time buffer). Thread 0 exiting causes a call to CPU::exitThreads() -> CPU::haltContext() -> CPU::removeThread(). This code causes the problem:

void
CPU::removeThread(ThreadID tid)
{
     .....
    // Flush out any old data from the time buffers.
    for (int i = 0; i < timeBuffer.getSize(); ++i) {
        timeBuffer.advance();
        fetchQueue.advance();
        decodeQueue.advance();
        renameQueue.advance();
        iewQueue.advance(); // This causes any completed instructions in-flight to commit, even those belonging to different threads, to be lost!
    }
    ....

Link to code

nmosier avatar Apr 20 '24 23:04 nmosier

One fix would just be to clear out all thread-specific state from each time buffer (or just iewQueue, since that's what's causing problems), rather than nuking the contents of all the time buffers.

nmosier avatar Apr 20 '24 23:04 nmosier

Thanks @nmosier ! I think your solution is correct.

BobbyRBruce avatar Apr 22 '24 18:04 BobbyRBruce