gem5 icon indicating copy to clipboard operation
gem5 copied to clipboard

CLFLUSH causes TLB assertion error

Open hnpl opened this issue 1 year ago • 2 comments

Describe the bug

Executing clflush instruction in FS mode caused this assertion error,

src/sim/simulate.cc:199: info: Entering event queue @ 14392355269750.  Starting simulation...                                                                                            
src/sim/simulate.cc:199: info: Entering event queue @ 15473386459500.  Starting simulation...                                                                      
src/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored                                                                
src/sim/simulate.cc:199: info: Entering event queue @ 15474400558750.  Starting simulation...                                                                      
src/mem/ruby/system/Sequencer.cc:668: warn: Replacement policy updates recently became the responsibility of SLICC state machines. Make sure to setMRU() near callba
cks in .sm files!                        
gem5.opt: src/cpu/translation.hh:257: void gem5::DataTranslation<ExecContextPtr>::finish(const Fault&, const RequestPtr&, gem5::ThreadContext*, gem5::BaseMMU::Mode)
 [with ExecContextPtr = gem5::TimingSimpleCPU*; gem5::Fault = std::shared_ptr<gem5::FaultBase>; gem5::RequestPtr = std::shared_ptr<gem5::Request>]: Assertion `mode
== state->mode' failed.                  
Program aborted at tick 15478398983501

The problem seems to be that,

  • In the gem5's X86 ISA implementation, clflush* is a store instruction.
  • In TimingSimpleCPU, the translation of a store instruction will set the TLB to BaseMMU::Write mode.
  • While translation is happening, the state of the translation is changed to BaseMMU::Read as the mem request caused by clflush is a isCacheClean request. https://github.com/gem5/gem5/blob/v23.1.0.0/src/arch/x86/tlb.cc#L556-L564
  • This resulted in a state mismatch, and subsequently caused the mention assertion error, which requires the state of the TLB and the state of translation to be the same.

Affects version v23.1

gem5 Modifications N/A

Additional information Related issue: #226 Related change: #592

Reverting #592 fixes the error on my side, but it's not ideal due to issue #226. I think we can make the clflush* instructions to be load instructions, but I'm not sure if that change is functionally correct.

hnpl avatar Feb 09 '24 23:02 hnpl

A possible fix is to change line 257 to:

// CLFLUSHOPT/WB/FLUSH is treated as read for protection checks but modelled as write
assert(mode == state->mode || req->isCacheClean());

https://github.com/gem5/gem5/blob/c890e6b113a59caeec3eea9bafea4a03a471f89d/src/cpu/translation.hh#L257

AKKamath avatar Feb 10 '24 00:02 AKKamath

Thanks for the fix @Lukas-Zenick. I wonder if this error only occurs for the timing CPU or if it also happens when using the O3 CPU?

If it's only the timing CPU, then maybe we should update the TimingSimpleCPU::writeMem function. If it happens for all CPU models, then I think your fix in #1080 is correct.

powerjg avatar May 09 '24 18:05 powerjg