zsim icon indicating copy to clipboard operation
zsim copied to clipboard

Assertion when prefetcher is turned on

Open githubchik opened this issue 9 years ago • 6 comments

I am getting a similar issue as in "https://github.com/s5z/zsim/issues/34". The issue occurs with prefetcher between l1d and l2 and using mem type MD1. Without the prefetcher the error is not there. I would appreciate your comments on fixing this issue. Here is the error:

| Running on 24 Cores... [S 0] Thread 28 starting [S 0] Thread 29 starting [S 0] Thread 30 starting [S 0] Failed assertion on build/opt/coherence_ctrls.cpp:109 '*state == S || *state == E' (with '0 == 1 || 0') [S 0] [28] Internal exception detected: [S 0] [28] Code: 1 [S 0] [28] Address: 0x7ffff7195727 [S 0] [28] Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff7195727. Access Type: UNKNOWN. Access Address = 0x000000000 [S 0] [28] Caused by invalid access to address 0x0

Here is the backtrace: [S 0] [28] Backtrace (13/40 max frames) [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/zsim.cpp:1401 / InternalExceptionHandler(unsigned int, LEVEL_BASE::EXCEPTION_INFO_, LEVEL_VM::PHYSICAL_CONTEXT_, void_) [S 0] [28] sha1.c:0 / LEVEL_PINCLIENT::IEH_CALLBACKS::NotifyInternalException(unsigned int, LEVEL_BASE::EXCEPTION_INFO_, LEVEL_VM::CONTEXT_) [S 0] [28] /rsghome/sadegh/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL21HandleExceptionInToolEPNS_5PCTXTEPN10LEVEL_BASE14EXCEPTION_INFOE+0x1e8) [0x30753b58] [S 0] [28] /rsghome/sadegh/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL19InternalHandlerSyncEiPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTEPPKNS_14SCT_ATTRIBUTESEPNS_5PCTXTE+0x33f) [0x3076686f] [S 0] [28] /rsghome/sadegh/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL20HandlePhysicalSignalEPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTE+0x136) [0x30767816] [S 0] [28] /rsghome/sadegh/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin(ZN5PINVM28SIGNAL_DETAILS_LINUX_INTEL6415InternalHandlerEiPN7BARECRT8SIGXINFOEPv+0xa4) [0x3080bdd4] [S 0] [28] /rsghome/sadegh/pin-2.10-45467-gcc.3.4.6-ia32_intel64-linux/intel64/bin/pinbin(BARECRT_SigReturnRt+0) [0x3083b910] [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/coherence_ctrls.cpp:139 / MESIBottomCC::processAccess(unsigned long, unsigned int, AccessType, unsigned long, unsigned int, unsigned int) [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/coherence_ctrls.h:471 / MESITerminalCC::processAccess(MemReq const&, int, unsigned long, unsigned long) [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/cache.cpp:78 / Cache::access(MemReq&) [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/filter_cache.h:186 / FilterCache::replace(unsigned long, unsigned int, bool, unsigned long) [S 0] [28] /file0/Monolithic_3D_Work/Architecture/latest_zsim/zsim_Centos6/zsim/build/opt/filter_cache.h:150 / load [S 0] [28] [0x7fffe4ed63fb] C:Tool (or Pin) caused signal 11 at PC 0x7ffff7195727 [H] Child 20918 done [H] Panic on build/opt/zsim_harness.cpp:118: Child 20918 (idx 0) exit was anomalous, killing simulation

Here is my cfg file snippet: caches = {

    l1i = {
        array = {
            type = "SetAssoc";
            ways = 4;
        };
        caches = 4;
        latency = 1;
        parent = "l2";
        size = 32768; # 32KB
    };

    l1d = {
        array = {
            type = "SetAssoc";
            ways = 8;
        };
        caches = 4;
        latency = 2;
        parent = "l2prefetcher";

        size = 32768; # 32KB
    };

    l2prefetcher = {
        isPrefetcher = true;
        parent = "l2";
        prefetchers = 4;
    };

    l2 = {
        array = {
            type = "SetAssoc";
            ways = 8;
        };
        caches = 1;
        banks = 1;
        latency = 2;
        Wrlatency = 26;
        parent = "mem";
        repl = {
            type = "LRUNoSh";
       };
        size = 268435456; # 256KB
    };

};
frequency = 6000;
lineSize = 64;
mem = {
    controllers = 1024;
    type = "MD1";
    latency= 16;
    wrLatency= 69;
    bandwidth= 32768;
};

githubchik avatar Jun 07 '16 20:06 githubchik

Hi

Couple of weeks ago, when i wrote this post #119 , I've made some changes to the prefetcher code to work with the weave model.

So far, that's the reason you always hit this assertion with "Access Address = 0x000000000" it is because the "new petitions" creates and modify timing records without an TimingEvent to handle them in the weave phase. I don't remember explicitly if this was because in the contention, there is a TimingRecord with a StartEvent with value of NULL . ( which seems so because of the log you post ) , or there are other assert, that someone in the chain does evRec->popRecord(), and left nothing to the contention to operate with.

Again, that's the main reason I was asking #119 for a better understanding on the usage of the TimingEvents, and TimingRecord, structures. And actually the overall weave phase. There are already good questions and answers i.e. #53 in the forum, but still there is some room for clearing and documenting.

I'll made the patch presentable and I'll send it over.

rommelsv avatar Jun 09 '16 09:06 rommelsv

Ok. The branch that contains the patch is here: [https://github.com/rommelsv/zsim/tree/initial-pf],
That branch have actually two new features: one variable to control the size when the HDF5 is about to write to the disk, and the number of entries you want the prefetcher to handle. There is also a sample file that indicates how to use it.

Remember, this is just a patch to make the prefetcher work with the weave models. Sill have some things to debug, to properly have it working. Also, keep it in mind that comments are OK: this is the DCU version for the Westmere architecture. So any other update wil be very welcome in the sense of extending the prefetcher itself. for L1-L2 but also for upper levels.

rommelsv avatar Jun 11 '16 18:06 rommelsv

Thanks rommelsv for posting the patch. I tried it but am getting the following compile errors while building zsim:

build/opt/virt/patchdefs.h: In function ‘void VirtInit()’: build/opt/virt/patchdefs.h:41:4: error: ‘SYS_getcpu’ was not declared in this scope PF(SYS_getcpu, PatchGetcpu); ^ build/opt/virt/virt.cpp:68:48: note: in definition of macro ‘PF’ #define PF(syscall, pfn) prePatchFunctions[syscall] = pfn; ^ scons: *** [build/opt/virt/virt.os] Error 1 build/opt/init.cpp: In function ‘CacheGroup* BuildCacheGroup(Config&, const stri ng&, bool)’: build/opt/init.cpp:406:14: error: redeclaration of ‘uint32_t size’ uint32_t size = config.get<uint32_t>(prefix + "size", 64_1024); ^ build/opt/init.cpp:381:14: error: ‘uint32_t size’ previously declared here uint32_t size = config.get<uint32_t>(prefix + "size", 64_1024); ^ build/opt/init.cpp:407:14: error: redeclaration of ‘uint32_t banks’ uint32_t banks = config.get<uint32_t>(prefix + "banks", 1); ^ build/opt/init.cpp:382:14: error: ‘uint32_t banks’ previously declared here uint32_t banks = config.get<uint32_t>(prefix + "banks", 1); ^ build/opt/init.cpp:408:14: error: redeclaration of ‘uint32_t caches’ uint32_t caches = config.get<uint32_t>(prefix + "caches", 1); ^ build/opt/init.cpp:383:14: error: ‘uint32_t caches’ previously declared here uint32_t caches = config.get<uint32_t>(prefix + "caches", 1); ^ build/opt/init.cpp:410:14: error: redeclaration of ‘uint32_t bankSize’ uint32_t bankSize = size/banks; ^ build/opt/init.cpp:387:14: error: ‘uint32_t bankSize’ previously declared here uint32_t bankSize = size/banks; ^ scons: *** [build/opt/init.os] Error 1 scons: building terminated because of errors.

githubchik avatar Jun 14 '16 19:06 githubchik

Hey. githubchik. sorry, my bad on one of those errors, the init.cpp Seems that I got confused when I was syncing with the latest zsim version. generally I'm working with an old one. Just for homogenization with other tests. I pushed one more commit. so it might be "ready" The other one, well look at #1 Let me know.

rommelsv avatar Jun 15 '16 14:06 rommelsv

Hi @rommelsv I am now using your patch for doing prefetching in zsim. It was a success with your simple-pf.cfg. But when I tried some other benchmarks(the Galois graph framework), it failed with a similar question.(accessing invalid address). The detailed report is like this.

[S 0] pfc-0: pos 58 stride 1 conf 2 lastPrefetchPos 56 prefetchPos 59 fetchDepth 1 [S 0] [0] Internal exception detected: [S 0] [0] Code: 1 [S 0] [0] Address: 0x7ffff632dfb7 [S 0] [0] Description: Exception Code: ACCESS_INVALID_ADDRESS. Exception Address = 0x7ffff632dfb7. Access Type: UNKNOWN. Access Address = 0x000000008 [S 0] [0] Caused by invalid access to address 0x8 [S 0] [0] Backtrace (12/40 max frames) [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/zsim.cpp:1392 / InternalExceptionHandler [S 0] [0] :? / LEVEL_PINCLIENT::IEH_CALLBACKS::NotifyInternalException(unsigned int, LEVEL_BASE::EXCEPTION_INFO*, LEVEL_VM::CONTEXT*) [S 0] [0] /home/chao/git_repos/pin/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL19InternalHandlerSyncEiPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTEPPKNS_14SCT_ATTRIBUTESEPNS_5PCTXTEPj+0x444) [0x3043a9454] [S 0] [0] /home/chao/git_repos/pin/intel64/bin/pinbin(_ZN8LEVEL_VM12SIGNALS_IMPL20HandlePhysicalSignalEPN7BARECRT8SIGXINFOEPN5PINVM11ISIGCONTEXTE+0x124) [0x3043aa1f4] [S 0] [0] /home/chao/git_repos/pin/intel64/bin/pinbin(_ZN5PINVM28SIGNAL_DETAILS_LINUX_INTEL6415InternalHandlerEiPN7BARECRT8SIGXINFOEPv+0xe8) [0x304438c88] [S 0] [0] /home/chao/git_repos/pin/intel64/bin/pinbin(BARECRT_SigReturnRt+0) [0x30446603c] [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/slab_alloc.h:108 / slab::SlabAlloc::alloc(unsigned long) [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/coherence_ctrls.cpp:109 / MESIBottomCC::processAccess(unsigned long, unsigned int, AccessType, unsigned long, unsigned int, unsigned int) [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/coherence_ctrls.h:472 / MESITerminalCC::processAccess(MemReq const&, int, unsigned long, unsigned long*) [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/cache.cpp:94 / Cache::access(MemReq&) [S 0] [0] /home/chao/git_repos/rommelsv-zsim/build/opt/filter_cache.h:138 / FilterCache::replace(unsigned long, unsigned int, bool, unsigned long) [S 0] [0] [0x7fffe3afc972] C: Tool (or Pin) caused signal 11 at PC 0x7ffff632dfb7 [H] Child 368419 done [H] Panic on build/opt/zsim_harness.cpp:123: Child 368419 (idx 0) exit was anomalous, killing simulation

Here is the config file I used.

sys = { cores = { simpleCore = { type = "Simple"; cores = 8; dcache = "l1d"; icache = "l1i"; }; };

lineSize = 64;

caches = {
    l1d = {
        caches = 8;
        size = 32768;
    };
    l1i = {
        caches = 8;
        size = 32768;
    };
    pfc = {
        isPrefetcher = True;
        prefetchers = 8;
        children = "l1d"
    };
    l2 = {
        caches = 8;
        size = 262144;
        children = "l1i|pfc";  // interleave
    };
    l3 = {
        caches = 1;
        size = 262144;
        children = "l2";
    };
};

};

sim = { phaseLength = 10000; // attachDebugger = True; schedQuantum = 100; // switch threads frequently procStatsFilter = "l1.|l2."; };

process0 = { command = "/home/chao/git_repos/Galois/build/debug/apps/bfs/bfs -algo=async /home/chao/git_repos/Galois/inputs/structured/rome99.gr -t 8"

command = "/home/chao/git_repos/zsim2/misc/ven_api/test3_instr"

command = "ls -l"

};

=============================== Could you give me some hint on how to fix such problems? I really appreciate your help.

albertghtoun avatar Feb 15 '17 07:02 albertghtoun

Hi @rommelsv , I found that the very similar problem (assertion failure) arise when I simulate multiple cores in the config file. So could you please provide a working config files with multiple cores simulated when using zsim prefetcher?

albertghtoun avatar Feb 19 '17 06:02 albertghtoun