gem5
gem5 copied to clipboard
cpu-o3: executing MFENCE in two parallel SMT threads causes hang
Describe the bug
Executing an x86 mfence
instruction in two sibling SMT threads on the O3 CPU causes the CPU to get stuck. Specifically, it appears that one of the mfence
s reaches the head of the ROB but never becomes ready to execute.
This simple assembly proof of concept (mfence.asm
) triggers the bug:
global _start
_start:
mov dword [rsp], eax
mfence
mov eax, 60
mov edi, 0
syscall
Affects version develop @ c54039d
gem5 Modifications None
To Reproduce
- Build gem5:
scons build/X86/gem5.opt
- Assemble the POC:
nasm -felf64 -o mfence.o mfence.asm && ld -o mfence mfence.o
- Run it under SMT:
./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU --smt --caches -c './mfence;./mfence'
Terminal Output
$ timeout -s INT 10 ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version DEVELOP-FOR-24.0
gem5 compiled Apr 20 2024 20:38:46
gem5 started Apr 20 2024 21:17:07
gem5 executing on cafe-cet, pid 1395698
command line: ./build/X86/gem5.opt configs/deprecated/example/se.py --cpu-type=X86O3CPU -c './mfence;./mfence' --smt --caches
warn: The se.py script is deprecated. It will be removed in future releases of gem5.
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
src/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/arch/x86/linux/se_workload.cc:76: warn: Unknown operating system; assuming Linux.
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
system.remote_gdb: Listening for connections on port 7000
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored
**** REAL SIMULATION ****
src/sim/simulate.cc:199: info: Entering event queue @ 0. Starting simulation...
Exiting @ tick 15376263500 because user interrupt received
As you can see, the simulation times out after 10 seconds without both threads exiting.
Expected behavior Both threads should exit immediately.
Host Operating System Ubuntu 22.04
Host ISA X86
Compiler used GCC 11.4.0
Additional information None yet
I think I've found the root cause of the issue. Thread 0 exits in the same cycle that thread 1's MFENCE instruction [sn:74] is sent from IEW to commit (via the toCommit
time buffer). Thread 0 exiting causes a call to CPU::exitThreads() -> CPU::haltContext() -> CPU::removeThread(). This code causes the problem:
void
CPU::removeThread(ThreadID tid)
{
.....
// Flush out any old data from the time buffers.
for (int i = 0; i < timeBuffer.getSize(); ++i) {
timeBuffer.advance();
fetchQueue.advance();
decodeQueue.advance();
renameQueue.advance();
iewQueue.advance(); // This causes any completed instructions in-flight to commit, even those belonging to different threads, to be lost!
}
....
One fix would just be to clear out all thread-specific state from each time buffer (or just iewQueue
, since that's what's causing problems), rather than nuking the contents of all the time buffers.
Thanks @nmosier ! I think your solution is correct.