pilgrim icon indicating copy to clipboard operation
pilgrim copied to clipboard

Decoder for Time Information not found

Open marcocrxu opened this issue 2 years ago • 5 comments

Pilgrim uses several methods to compress time information (interval and duration). It seems there is no decoder for time information in pilgrim_app_generator.c. The output of time information to intervals.dat and durations.dat in pilgrim_logger.c makes me feel confused and have no ideas how to decode it. Could you please give some decode cases about them? Many thanks.

marcocrxu avatar Mar 23 '23 07:03 marcocrxu

Pilgrim app generator(pilgrim_app_generator.c) is a code generator. Given the pilgrim traces, It tries to generate a C program that recovers the communication pattern. It relies on the order of the captured calls, not the detailed time information.

I just added an example of decoding timestamps in pilgrim2text.c. https://github.com/pmodels/pilgrim/pull/29 (You need to rebuild pilgrim and re-generate the traces)

So you may want to start from pilgrim2text.c. You can also try running pilgrim2text /path/to/your/trace-dir to see the outputs. The command will generate a _text directory under your traces directory.

wangvsa avatar Mar 23 '23 17:03 wangvsa

By the way, the two papers below have all the details about Pilgrim: Near-Lossless MPI Tracing and Proxy Application Autogeneration Pilgrim: Scalable and (near) Lossless MPI Tracing

wangvsa avatar Mar 23 '23 17:03 wangvsa

I have read your papers, and I am interested in Pilgrim. Your new pr helped me a lot. Many thanks. However, I found that sometimes I got segmentation fault when generating proxy using Pilgrim. The test program is flash, like sedov-3d, stirturb. It seems that sym->val may < 0 in some cases here.

void handle_one_symbol_pre(FILE* f, Symbol *sym, CallSignature *cst) {

    if(wt_loop) {

        if(sym->val >= 0)
            wt_loop_count -= get_wt_completed_reqs(&cst[sym->val]);
        /* sym->val may < 0 and then segmentation fault happens */
        if(cst[sym->val].func_id != wt_loop_call_id) {
            // .....
        }
    }
}

I try to return at once if sym->val < 0, but the proxy it generated will be messed up. Do you know how to fix this? Much appreciated in advance. Also, I found the generated proxy use same buf in MPI_Reduce for both send and receive, which result in a runtime error reported by openMPI. I made a small modification, like using a buf_recv, to avoid this problem.

marcocrxu avatar Apr 03 '23 13:04 marcocrxu

Here is a case when I return immediately in handle_one_symbol_pre. The function for nonterm is empty, which is abnormal. The buf_0 = malloc(10000000); is set by me since sometimes buf_0 = malloc(0); may happen.

image

marcocrxu avatar Apr 03 '23 13:04 marcocrxu

Sorry for the late response, was totally occupied by other projects. Will take a look at this, but might take me a while.

wangvsa avatar Apr 24 '23 22:04 wangvsa