NEMU icon indicating copy to clipboard operation
NEMU copied to clipboard

Checkpoint files are not generated

Open rseac opened this issue 7 months ago • 1 comments

Before start

  • [x] I have read the XiangShan Documents. 我已经阅读过香山文档。
  • [x] I have searched the previous issues and did not find anything relevant. 我已经搜索过之前的 issue,并没有找到相关的。
  • [x] I have searched the previous discussions and did not find anything relevant. 我已经搜索过之前的 discussions,并没有找到相关的。
  • [x] I have reproduced the problem using the latest commit on the master branch. 我已经使用 master 分支最新的 commit 复现了问题。

Describe you problem

I am trying to generate checkpoints using NEMU so that I can run it on XiangShan. I am following the instructions to do so, but the checkpoint files are not generated. It appears that the profiling is done, and the clustering is done. But the generation of the checkpoints doesn't result in any files being generated.

What did you do before

Setup tools

git clone https://github.com/OpenXiangShan/xs-env.git
cd /xs-env && sudo -s ./setup-tools.sh && ./setup.sh && source env.sh && source update-submodule.sh

Setup NEMU and simpoint

cd $NEMU_HOME
git submodule update --init

cd $NEMU_HOME/resource/simpoint/simpoint_repo
make clean
make

cd $NEMU_HOME
make clean
make riscv64-xs-cpt_defconfig
make -j 8

cd $NEMU_HOME/resource/gcpt_restore
make 

Set an example from nexus-am/apps for checkpoint

cd /xs-env/nexus-am/apps/hello/

Rework the hello.c to so that the traps are set.

#define DISABLE_TIME_INTR 0x100
#define NOTIFY_PROFILER 0x101
#define GOOD_TRAP 0x0

void nemu_signal(int a){
    asm volatile ("mv a0, %0\n\t"
                  ".insn r 0x6B, 0, 0, x0, x0, x0\n\t"
                  :
                  : "r"(a)
                  : "a0");
}
#include <klib.h>

int main()
{

    nemu_signal(DISABLE_TIME_INTR);
    nemu_signal(NOTIFY_PROFILER);
    printf("Hello, XiangShan!\n");
    nemu_signal(GOOD_TRAP);
    return 0;
}

Compile hello

make ARCH=riscv64-xs

Run the checkpoint steps

I used the following script.

#!/bin/bash

# prepare env

export NEMU_HOME=/xs-env/NEMU
export NEMU=$NEMU_HOME/build/riscv64-nemu-interpreter
export GCPT=$NEMU_HOME/resource/gcpt_restore/build/gcpt.bin
export SIMPOINT=$NEMU_HOME/resource/simpoint/simpoint_repo/bin/simpoint

export WORKLOAD_ROOT_PATH=/xs-env/nexus-am/apps/hello/build/
export LOG_PATH=$NEMU_HOME/hello/logs
export RESULT=$NEMU_HOME/hello_result
export profiling_result_name=simpoint-profiling
export PROFILING_RES=$RESULT/$profiling_result_name
export interval=$((2))

# Profiling
# using config: riscv64-xs-cpt_defconfig
profiling(){
    set -x
    workload=$1
    log=$LOG_PATH/profiling_logs
    mkdir -p $log

    $NEMU ${WORKLOAD_ROOT_PATH}/${workload}.bin \
        -D $RESULT -w $workload -C $profiling_result_name    \
        -b --simpoint-profile --cpt-interval ${interval} > $log/${workload}-out.txt 2>${log}/${workload}-err.txt
}

export -f profiling

# Cluster

cluster(){
    set -x
    workload=$1

    export CLUSTER=$RESULT/cluster/${workload}
    mkdir -p $CLUSTER

    random1=`head -20 /dev/urandom | cksum | cut -c 1-6`
    random2=`head -20 /dev/urandom | cksum | cut -c 1-6`

    log=$LOG_PATH/cluster_logs/cluster
    mkdir -p $log

    $SIMPOINT \
        -loadFVFile $PROFILING_RES/${workload}/simpoint_bbv.gz \
        -saveSimpoints $CLUSTER/simpoints0 -saveSimpointWeights $CLUSTER/weights0 \
        -inputVectorsGzipped -maxK 30 -numInitSeeds 2 -iters 1000 -seedkm ${random1} -seedproj ${random2} \
        > $log/${workload}-out.txt 2> $log/${workload}-err.txt
}

export -f cluster
# Checkpointing
# using config: riscv64-xs-cpt_defconfig
checkpoint(){
    set -x
    workload=$1

    export CLUSTER=$RESULT/cluster
    log=$LOG_PATH/checkpoint_logs
    mkdir -p $log
    $NEMU ${WORKLOAD_ROOT_PATH}/${workload}.bin \
         -D $RESULT -w ${workload} -C spec-cpt  \
         -b -S $CLUSTER --cpt-interval $interval \
         --checkpoint-format zstd > $log/${workload}-out.txt 2>$log/${workload}-err.txt
}

export -f checkpoint

profiling hello-riscv64-xs
cluster hello-riscv64-xs
checkpoint hello-riscv64-xs

The files I see generated

tree NEMU/hello*

NEMU/hello
`-- logs
    |-- checkpoint_logs
    |   |-- hello-riscv64-xs-err.txt
    |   `-- hello-riscv64-xs-out.txt
    |-- cluster_logs
    |   `-- cluster
    |       |-- hello-riscv64-xs-err.txt
    |       `-- hello-riscv64-xs-out.txt
    `-- profiling_logs
        |-- hello-riscv64-xs-err.txt
        `-- hello-riscv64-xs-out.txt
NEMU/hello_result
|-- cluster
|   `-- hello-riscv64-xs
|       |-- simpoints0
|       `-- weights0
|-- simpoint-profiling
|   `-- hello-riscv64-xs
|       `-- simpoint_bbv.gz
`-- spec-cpt
    `-- hello-riscv64-xs
        `-- 1

Environment

  • XiangShan branch: master
  • XiangShan commit id: 4bbdccbb077840af5e1b65c7138d31af3966f625
  • NEMU commit id: 4a24b77a61505e34745667b1ad712a817b090cf8
  • SPIKE commit id:
  • Operating System: Ubuntu 22.04
  • gcc version: 11.4.0
  • mill version: 0.12.10
  • java version: 11.0.26

Additional context

I also tried this with the application stream (as it has been used in some of the tutorials such as ASPLOS 2025), but I had the same problem: nexus-am/apps/stream.

rseac avatar May 08 '25 01:05 rseac

I think the cause of this issue is that the app in nexus-am runs entirely in M-mode, and NEMU requires the --cpt-mmode option to generate checkpoints in M-mode.

While this option does allow checkpoint generation, the generated checkpoints cannot be used for restore with emu, because resource/gcpt_restore in NEMU does not support restoring M-mode checkpoints.

Therefore, my suggestion is to wrap the workload you want to checkpoint inside OpenSBI and Linux, and run it as a user-space program.

Additionally, the stream used in the tutorial you mentioned is not directly built from the stream in nexus-am, but instead, as explained above, it is packaged as a user-space program under Linux.

xyyy1420 avatar May 09 '25 09:05 xyyy1420