verilator icon indicating copy to clipboard operation
verilator copied to clipboard

std::system_error in ci/docker on calling of VerilatedCov::write

Open strau0106 opened this issue 1 year ago • 11 comments

Hi everyone,

After switching to a docker container with a self build verilator version, since debian only has 5.006 and I need 5.024 for struct /verilator public/ I am getting std::system_errors when ever the built exectuable tries to write the coverage file.

full backtrace:

warning: Section `.reg-xstate/532' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `bin/cpu_test'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/532' in core file too small.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f99680a3e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f9968054fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f996803f472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007f9968396919 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f99683a1e1a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f99683a1e85 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f99683a20d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f99683995e6 in std::__throw_system_error(int) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00005616812992a5 in std::mutex::lock (this=0x561682684f70) at /usr/include/c++/12/bits/std_mutex.h:104
#10 VerilatedMutex::lock (this=this@entry=0x561682684f70) at /usr/local/share/verilator/include/verilated.h:196
#11 0x000056168129f1c8 in VerilatedLockGuard::VerilatedLockGuard (mutexr=..., this=0x7ffe2dd66ef0) at /usr/local/share/verilator/include/verilated.h:228
#12 VerilatedCovImp::write (this=0x561682684f60, filename="logs/cpu.dat") at /usr/local/share/verilator/include/verilated_cov.cpp:362
#13 0x000056168129cff7 in VerilatedCovContext::write (this=<optimized out>, filename="logs/cpu.dat") at /usr/local/share/verilator/include/verilated_cov.cpp:442
#14 0x000056168128eb00 in VerilatedCov::write (filename="logs/cpu.dat") at /usr/local/share/verilator/include/verilated_cov.h:168
#15 main (argc=<optimized out>, argv=<optimized out>) at ../../test/cpu_test.cpp:117
(gdb) 

I am really unsure what my next debugging steps would be here. I am unsure wether the ./nptl/pthread_kill.c: No such file or directory. is a problem here, because this is on my system and the error occurs in a docker container.

Here is the Dockerfile for that container:

FROM debian:bookworm-slim

HEALTHCHECK NONE

ENTRYPOINT []

ARG USER_NAME=verilator
ARG USER_HOME=/home/verilator
ARG USER_ID=1000
ARG VERILATOR_VERSION=v5.024

RUN groupadd verilator \
    && useradd -g verilator -m verilator -s /bin/bash \
    && apt-get update \
    && apt-get install --no-install-recommends -y sudo \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && echo verilator ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/verilator \
    && chmod 0440 /etc/sudoers.d/verilator

RUN apt-get update && \
  apt-get install --no-install-recommends -y \
                        libc6 \
                        autoconf \
                        bc \
                        bison \
                        build-essential \
                        ca-certificates \
                        ccache \
                        clang \
                        cmake \
                        flex \
                        gdb \
                        git \
                        gtkwave \
                        help2man \
                        libfl2 \
                        libfl-dev \
                        libgoogle-perftools-dev \
                        libsystemc \
                        libsystemc-dev \
                        numactl \
                        perl \
                        python3 \
                        wget \
                        z3 \
                        zlib1g \
                        zlib1g-dev && \
  # Removing documentation packages *after* installing them is kind of hacky,
  # but it only adds some overhead while building the image.
  apt-get --purge remove -y .\*-doc$ && \
  # Remove more unnecessary stuff
  apt-get clean -y && \
  rm -rf /var/lib/apt/lists/*

ENV HOME "${USER_HOME}"

USER "${USER_NAME}"

WORKDIR "${HOME}"

# Clone Verilator and build from source
RUN git clone https://github.com/verilator/verilator.git && \
  cd verilator && \
  git checkout ${VERILATOR_VERSION} && \
  autoconf && \
  ./configure && \
  make -j$(nproc) && \
  sudo make install
RUN  cd verilator/ && make test
# Clean up
RUN rm -rf "${HOME}/verilator"

Can you guide me to my next debugging steps or even have a solution for me? Thanks a lot.

strau0106 avatar Aug 14 '24 19:08 strau0106

Quickly confirming this also happens on your provided docker image verilator/verilator latest (v5.026) and v5.024

strau0106 avatar Aug 14 '24 19:08 strau0106

This issue also occurs when using the verilator from debian apt repos.

Furthermore I realised this is the only file where I am fst dumping. Could that be the issue? I dump with what i think is the "new method" contextp instead of Verilated::

But why only in the docker container. I dont get it.

strau0106 avatar Aug 15 '24 12:08 strau0106

Okay, I made some progress. It only crashes in the docker conainer because I only let google test export a report in the docker container for gitlab to show it to me. I can now reproduce the test. somewhat and will try to pin the issue down further.

strau0106 avatar Aug 15 '24 13:08 strau0106

BTW I'd first suspect the problem is because the "logs" directory doesn't exist. As to why docker's involved I'll let you debug further.

wsnyder avatar Aug 15 '24 13:08 wsnyder

Nothing to do with docker anymore and the logs directory exists. i let verilator create it:

    Verilated::commandArgs(argc, argv);
    testing::InitGoogleTest(&argc, argv);
    auto res = RUN_ALL_TESTS();
    Verilated::mkdir("logs");
    VerilatedCov::write("logs/cpu.dat");
    return res;

Whenever I let google test write a test report for me, I experience this crash. This is the last code in verilated.h that is called before libraries.

public:
  /// Construct mutex (without locking it)
  VerilatedMutex() = default;
  ~VerilatedMutex() = default;
  VL_UNCOPYABLE(VerilatedMutex);
  const VerilatedMutex& operator!() const { return *this; }  // For -fthread_safety
  /// Acquire/lock mutex
  void lock() VL_ACQUIRE() VL_MT_SAFE {
      // Try to acquire the lock by spinning.  If the wait is short,
      // avoids a trap to the OS plus OS scheduler overhead.
      if (VL_LIKELY(try_lock())) return;  // Short circuit loop
      for (int i = 0; i < VL_LOCK_SPINS; ++i) {
          if (VL_LIKELY(try_lock())) return;
          VL_CPU_RELAX();
      }
      // Spinning hasn't worked, pay the cost of blocking.
      m_mutex.lock(); <===== HERE
  }
  

I suspect googletest is trying to write the test reports at the same time, thus the mutex lock is required and happens because the write takes longer than VL_LOCK_SPINS.

@wsnyder do you have any suggestion on what i do next

strau0106 avatar Aug 15 '24 15:08 strau0106

Infact, if I let my main thread sleep,

int main(int argc, char** argv) {
    Verilated::commandArgs(argc, argv);
    testing::InitGoogleTest(&argc, argv);
    auto res = RUN_ALL_TESTS();
    std::this_thread::sleep_for(std::chrono::seconds(1));std::this_thread::sleep_for(std::chrono::seconds(1));

    Verilated::mkdir("logs");
    VerilatedCov::write("logs/cpu.dat");
    return res;
}

I experience no crash anymore.

my gut says this a bug in verilator, but I have my work around for now.

strau0106 avatar Aug 15 '24 15:08 strau0106

@wsnyder I am now simply observing that coverage files are empty when using timing functions.

Any clues?

strau0106 avatar Aug 20 '24 19:08 strau0106

I don't know, I'd suggest adding prints to see if the list of coverage points is empty for some reason, or if write isn't being called, or some other issue.

wsnyder avatar Aug 24 '24 12:08 wsnyder

Yes no coverage points. m_items in VerilatedCovImp... ill investigate further

strau0106 avatar Aug 25 '24 20:08 strau0106

Aw man i know whats going on now, I do contextp.reset() to restart time and with that i loose my coveragepoints.

@wsnyder any other way to start time at 0 again?

strau0106 avatar Aug 25 '24 20:08 strau0106

No, you need to reset the context. You can dump stats before you reset, then merge them.

wsnyder avatar Aug 25 '24 20:08 wsnyder