chipyard
chipyard copied to clipboard
Segfault running SimNetwork under verilator
Background Work
- [X] Yes, I searched the mailing list
- [X] Yes, I searched prior issues
- [X] Yes, I searched the documentation
Chipyard Version and Hash
Release: N/A Hash: ef71dfd40a5c12ca489760472209c02ac59b96ca
OS Setup
+ uname -a
Linux cerf 6.6.28-1-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Apr 17 13:19:22 UTC 2024 x86_64 GNU/Linux
+ lsb_release -a
LSB Version: n/a
Distributor ID: ManjaroLinux
Description: Manjaro Linux
Release: 23.1.4
Codename: Vulcan
(partial) `printenv`
CONDA_EXE=/home/seth/miniforge3/bin/conda
_CE_M=
_CE_CONDA=
CONDA_PYTHON_EXE=/home/seth/miniforge3/bin/python
CONDA_SHLVL=2
CONDA_BACKUP_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin:/home/seth/miniforge3/condabin:/usr/bin:/home/seth/.bun/bin:/home/seth/perl5/bin:/home/seth/Code/bin:/home/seth/.cargo/bin:/home/seth/.krew/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/seth/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/seth/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/android-sdk/cmdline-tools/latest/bin:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/opt/android-sdk/tools/bin:/usr/lib/emscripten:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/rustup/bin:/var/lib/snapd/snap/bin:/home/seth/Code/bin:/usr/local/kubebuilder/bin
JAVA_HOME=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm
JAVA_LD_LIBRARY_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm/lib/server
CONDA_PREFIX=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env
CONDA_DEFAULT_ENV=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env
CONDA_PROMPT_MODIFIER=(/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env)
CONDA_PREFIX_1=/home/seth/miniforge3
RISCV=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools
LD_LIBRARY_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/lib
GSETTINGS_SCHEMA_DIR_CONDA_BACKUP=
GSETTINGS_SCHEMA_DIR=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/share/glib-2.0/schemas
XML_CATALOG_FILES=file:///home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/etc/xml/catalog file:///etc/xml/catalog
JAVA_HOME_CONDA_BACKUP=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm
JAVA_LD_LIBRARY_PATH_BACKUP=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm/lib/server
_=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin/printenv
`conda list`
# packages in environment at /home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_sysroot_linux-64_curr_repodata_hack 3 h69a702a_14 conda-forge
aiohttp 3.9.3 py310h2372a71_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
alabaster 0.7.16 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.11 hd590300_1 conda-forge
annotated-types 0.6.0 pyhd8ed1ab_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
archspec 0.2.3 pyhd8ed1ab_0 conda-forge
argcomplete 3.2.3 pyhd8ed1ab_0 conda-forge
asttokens 2.4.1 pypi_0 pypi
async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge
atk-1.0 2.38.0 hd4edc92_1 conda-forge
attrs 23.2.0 pyh71513ae_0 conda-forge
autoconf 2.71 pl5321h2b4cb7a_1 conda-forge
aws-c-auth 0.7.8 h538f98c_2 conda-forge
aws-c-cal 0.6.9 h5d48c4d_2 conda-forge
aws-c-common 0.9.10 hd590300_0 conda-forge
aws-c-compression 0.2.17 h7f92143_7 conda-forge
aws-c-event-stream 0.3.2 h0bcb0bb_8 conda-forge
aws-c-http 0.7.14 hd268abd_3 conda-forge
aws-c-io 0.13.36 he0cd244_2 conda-forge
aws-c-mqtt 0.9.10 h35285c7_2 conda-forge
aws-c-s3 0.4.4 h0448019_0 conda-forge
aws-c-sdkutils 0.1.13 h7f92143_0 conda-forge
aws-checksums 0.1.17 h7f92143_6 conda-forge
aws-sam-translator 1.86.0 pyhd8ed1ab_0 conda-forge
aws-xray-sdk 2.13.0 pyhd8ed1ab_0 conda-forge
awscli 2.15.28 py310hff52083_0 conda-forge
awscrt 0.19.19 py310h43b4219_2 conda-forge
azure-core 1.30.1 pyhd8ed1ab_0 conda-forge
azure-identity 1.15.0 pyhd8ed1ab_0 conda-forge
babel 2.14.0 pyhd8ed1ab_0 conda-forge
bash 5.2.21 h7f99829_0 conda-forge
bash-completion 2.11 ha770c72_1 conda-forge
bc 1.07.1 h7f98852_0 conda-forge
bcrypt 4.1.2 py310hcb5633a_0 conda-forge
binutils 2.40 hdd6e379_0 conda-forge
binutils_impl_linux-64 2.40 hf600244_0 conda-forge
bison 3.8.2 h59595ed_0 conda-forge
blinker 1.7.0 pyhd8ed1ab_0 conda-forge
boltons 23.1.1 pyhd8ed1ab_0 conda-forge
boto3 1.34.61 pyhd8ed1ab_1 conda-forge
boto3-stubs 1.34.61 pyhd8ed1ab_0 conda-forge
botocore 1.34.61 pyge310_1234567_0 conda-forge
botocore-stubs 1.34.61 pyhd8ed1ab_0 conda-forge
brotli 1.1.0 hd590300_1 conda-forge
brotli-bin 1.1.0 hd590300_1 conda-forge
brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.27.0 hd590300_0 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
cachecontrol 0.14.0 pyhd8ed1ab_0 conda-forge
cachecontrol-with-filecache 0.14.0 pyhd8ed1ab_0 conda-forge
cachy 0.3.0 pyhd8ed1ab_1 conda-forge
cairo 1.18.0 h3faef2a_0 conda-forge
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
cffi 1.16.0 py310h2fee648_0 conda-forge
cfgv 3.3.1 pyhd8ed1ab_0 conda-forge
cfn-lint 0.86.0 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
clang-format 17.0.6 default_hb11cfb5_3 conda-forge
clang-format-17 17.0.6 default_hb11cfb5_3 conda-forge
clang-tools 17.0.6 default_hb11cfb5_3 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
click-default-group 1.2.4 pyhd8ed1ab_0 conda-forge
clikit 0.6.2 pyhd8ed1ab_2 conda-forge
cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge
cmake 3.26.3 h077f3f9_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
conda 23.9.0 py310hff52083_2 conda-forge
conda-gcc-specs 13.2.0 h6a59387_5 conda-forge
conda-lock 1.4.0 pyhd8ed1ab_2 conda-forge
conda-package-handling 2.2.0 pyh38be061_0 conda-forge
conda-package-streaming 0.9.0 pyhd8ed1ab_0 conda-forge
conda-standalone 24.1.2 ha770c72_0 conda-forge
conda-tree 1.1.0 pyhd8ed1ab_2 conda-forge
constructor 3.7.0 pyh55f8243_0 conda-forge
contourpy 1.2.0 py310hd41b1e2_0 conda-forge
coreutils 9.4 hd590300_0 conda-forge
crashtest 0.4.1 pyhd8ed1ab_0 conda-forge
cryptography 40.0.2 py310h34c0648_0 conda-forge
ctags 5.8 h14c3975_1000 conda-forge
curl 7.88.1 hdc1c0ab_1 conda-forge
cycler 0.12.1 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
diffutils 3.10 hf18258e_0 conda-forge
distlib 0.3.8 pyhd8ed1ab_0 conda-forge
distro 1.8.0 pyhd8ed1ab_0 conda-forge
docker-py 7.0.0 pyhd8ed1ab_0 conda-forge
docutils 0.19 py310hff52083_1 conda-forge
doit 0.36.0 pyhd8ed1ab_0 conda-forge
dtc 1.6.1 h166bdaf_2 conda-forge
ecdsa 0.18.0 pyhd8ed1ab_1 conda-forge
elfutils 0.187 h989201e_0 conda-forge
ensureconda 1.4.4 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge
expat 2.6.1 h59595ed_0 conda-forge
expect 5.45.4 h555a92e_0 conda-forge
fab-classic 1.19.2 pypi_0 pypi
file 5.39 h753d276_1 conda-forge
filelock 3.13.1 pyhd8ed1ab_0 conda-forge
findutils 4.6.0 h166bdaf_1001 conda-forge
flask 3.0.2 pyhd8ed1ab_0 conda-forge
flask_cors 3.0.10 pyhd3deb0d_0 conda-forge
flex 2.6.4 h58526e2_1004 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_1 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.49.0 py310h2372a71_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
frozenlist 1.4.1 py310h2372a71_0 conda-forge
fsspec 2024.2.0 pyhca7485f_0 conda-forge
gcc 13.2.0 hd6cf55c_3 conda-forge
gcc_impl_linux-64 13.2.0 h338b0a0_5 conda-forge
gdk-pixbuf 2.42.10 h829c605_5 conda-forge
gdspy 1.4 pypi_0 pypi
gengetopt 2.23 h9c3ff4c_0 conda-forge
gettext 0.21.1 h27087fc_0 conda-forge
giflib 5.2.1 h0b41bf4_3 conda-forge
git 2.44.0 pl5321h709897a_0 conda-forge
gitdb 4.0.11 pyhd8ed1ab_0 conda-forge
gitpython 3.1.42 pyhd8ed1ab_0 conda-forge
gmp 6.3.0 h59595ed_1 conda-forge
gmpy2 2.1.2 py310h3ec546c_1 conda-forge
gnutls 3.7.9 hb077bed_0 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
graphql-core 3.2.3 pyhd8ed1ab_0 conda-forge
graphviz 9.0.0 h78e8752_1 conda-forge
gtk2 2.24.33 h280cfa0_4 conda-forge
gts 0.7.6 h977cf35_4 conda-forge
gxx 13.2.0 hd6cf55c_3 conda-forge
gxx_impl_linux-64 13.2.0 h338b0a0_5 conda-forge
gzip 1.13 hd590300_0 conda-forge
hammer-vlsi 1.2.0 pypi_0 pypi
harfbuzz 8.3.0 h3d44ed6_0 conda-forge
html5lib 1.1 pyh9f0ad1d_0 conda-forge
humanfriendly 10.0 pyhd8ed1ab_6 conda-forge
icontract 2.6.6 pypi_0 pypi
icu 73.2 h59595ed_0 conda-forge
identify 2.5.35 pyhd8ed1ab_0 conda-forge
idna 3.6 pyhd8ed1ab_0 conda-forge
imagesize 1.4.1 pyhd8ed1ab_0 conda-forge
importlib-metadata 7.0.2 pyha770c72_0 conda-forge
importlib_metadata 7.0.2 hd8ed1ab_0 conda-forge
importlib_resources 6.3.0 pyhd8ed1ab_0 conda-forge
iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge
itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge
jaraco.classes 3.3.1 pyhd8ed1ab_0 conda-forge
jeepney 0.8.0 pyhd8ed1ab_0 conda-forge
jinja2 3.1.3 pyhd8ed1ab_0 conda-forge
jmespath 1.0.1 pyhd8ed1ab_0 conda-forge
joserfc 0.9.0 pyhd8ed1ab_0 conda-forge
jq 1.7.1 hd590300_0 conda-forge
jschema-to-python 1.2.3 pyhd8ed1ab_0 conda-forge
jsondiff 2.0.0 pyhd8ed1ab_0 conda-forge
jsonpatch 1.33 pyhd8ed1ab_0 conda-forge
jsonpickle 3.0.2 pyhd8ed1ab_1 conda-forge
jsonpointer 2.4 py310hff52083_3 conda-forge
jsonschema 4.21.1 pyhd8ed1ab_0 conda-forge
jsonschema-path 0.3.2 pyhd8ed1ab_0 conda-forge
jsonschema-specifications 2023.7.1 pyhd8ed1ab_0 conda-forge
junit-xml 1.9 pyh9f0ad1d_0 conda-forge
kernel-headers_linux-64 3.10.0 h4a8ded7_14 conda-forge
keyring 24.3.1 py310hff52083_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge
krb5 1.20.1 h81ceb04_0 conda-forge
lazy-object-proxy 1.10.0 py310h2372a71_0 conda-forge
lcms2 2.16 hb7c19ff_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20240116.1 cxx17_h59595ed_2 conda-forge
libarchive 3.5.2 hada088e_3 conda-forge
libblas 3.9.0 21_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcblas 3.9.0 21_linux64_openblas conda-forge
libclang-cpp17 17.0.6 default_hb11cfb5_3 conda-forge
libclang13 17.0.6 default_ha2b6cf4_3 conda-forge
libcups 2.3.3 h36d4200_3 conda-forge
libcurl 7.88.1 hdc1c0ab_1 conda-forge
libdeflate 1.19 hd590300_0 conda-forge
libdwarf 0.0.0.20190110_28_ga81397fc4 h753d276_0 ucb-bar
libdwarf-dev 0.0.0.20190110_28_ga81397fc4 h753d276_0 ucb-bar
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.1 h59595ed_0 conda-forge
libfdt 1.6.1 h166bdaf_2 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgcrypt 1.10.3 hd590300_0 conda-forge
libgd 2.3.3 h119a65a_9 conda-forge
libgfortran-ng 13.2.0 h69a702a_5 conda-forge
libgfortran5 13.2.0 ha4646dd_5 conda-forge
libgirepository 1.78.1 h003a4f0_1 conda-forge
libglib 2.80.0 hf2295e7_0 conda-forge
libgomp 13.2.0 h807b86a_5 conda-forge
libgpg-error 1.48 h71f35ed_0 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libidn2 2.3.7 hd590300_0 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
liblapack 3.9.0 21_linux64_openblas conda-forge
libllvm17 17.0.6 hb3ce162_1 conda-forge
libmagic 5.39 h753d276_1 conda-forge
libmicrohttpd 0.9.77 h97afed2_0 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.26 pthreads_h413a1c8_0 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libprotobuf 4.25.3 h08a7969_0 conda-forge
librsvg 2.56.3 he3f83f7_1 conda-forge
libsanitizer 13.2.0 h7e041cc_5 conda-forge
libsecret 0.18.8 h329b89f_2 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libsqlite 3.45.2 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge
libtasn1 4.19.0 h166bdaf_0 conda-forge
libtiff 4.6.0 ha9c0a0a_2 conda-forge
libunistring 0.9.10 h7f98852_0 conda-forge
libusb1 2.0.1 pyhd8ed1ab_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.48.0 hd590300_0 conda-forge
libwebp 1.3.2 h658648e_1 conda-forge
libwebp-base 1.3.2 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.5 h232c23b_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
livereload 2.6.3 pyh9f0ad1d_0 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
lzo 2.10 h516909a_1000 conda-forge
lzop 1.04 h3753786_2 conda-forge
m4 1.4.18 h516909a_1001 conda-forge
make 4.3 hd18ef5c_1 conda-forge
markupsafe 2.1.5 py310h2372a71_0 conda-forge
matplotlib-base 3.8.3 py310h62c0568_0 conda-forge
mock 5.1.0 pypi_0 pypi
more-itertools 10.2.0 pyhd8ed1ab_0 conda-forge
mosh 1.4.0 pl5321h7cc048c_8 conda-forge
moto 5.0.3 pyhd8ed1ab_0 conda-forge
mpc 1.3.1 hfe3b2da_0 conda-forge
mpfr 4.2.1 h9458935_0 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
msal 1.27.0 pyhd8ed1ab_0 conda-forge
msal_extensions 1.1.0 py310hff52083_1 conda-forge
msgpack-python 1.0.7 py310hd41b1e2_0 conda-forge
multidict 6.0.5 py310h2372a71_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mypy 1.9.0 py310h2372a71_0 conda-forge
mypy-boto3-s3 1.34.14 pyhd8ed1ab_0 conda-forge
mypy_boto3_ec2 1.34.61 pyhd8ed1ab_0 conda-forge
mypy_extensions 1.0.0 pyha770c72_0 conda-forge
ncurses 6.4 h59595ed_2 conda-forge
nettle 3.9.1 h7ab15ed_0 conda-forge
networkx 3.2.1 pyhd8ed1ab_0 conda-forge
nodeenv 1.8.0 pyhd8ed1ab_0 conda-forge
numpy 1.26.4 py310hb13e2d6_0 conda-forge
oniguruma 6.9.9 hd590300_0 conda-forge
open_pdks.sky130a 1.0.471_0_g97d0844 20240223_100318 litex-hub
openapi-schema-validator 0.6.2 pyhd8ed1ab_0 conda-forge
openapi-spec-validator 0.7.1 pyhd8ed1ab_0 conda-forge
openjdk 20.0.2 haa376d0_2 conda-forge
openjpeg 2.5.2 h488ebb8_0 conda-forge
openssl 3.2.1 hd590300_0 conda-forge
p11-kit 0.24.1 hc5aa10d_0 conda-forge
packaging 24.0 pyhd8ed1ab_0 conda-forge
pandas 2.2.1 py310hcc13569_0 conda-forge
pango 1.52.1 ha41ecd1_0 conda-forge
paramiko 3.4.0 pyhd8ed1ab_0 conda-forge
paramiko-ng 2.8.10 pypi_0 pypi
pastel 0.2.1 pyhd8ed1ab_0 conda-forge
patch 2.7.6 h7f98852_1002 conda-forge
pathable 0.4.3 pyhd8ed1ab_0 conda-forge
pbr 6.0.0 pyhd8ed1ab_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
perl 5.32.1 7_hd590300_perl5 conda-forge
pillow 10.2.0 py310h01dd4db_0 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pixman 0.43.2 h59595ed_0 conda-forge
pkginfo 1.10.0 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge
platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge
pluggy 1.4.0 pyhd8ed1ab_0 conda-forge
popt 1.16 h0b475e3_2002 conda-forge
portalocker 2.8.2 py310hff52083_1 conda-forge
pre-commit 3.6.2 pyha770c72_0 conda-forge
prompt-toolkit 3.0.38 pyha770c72_0 conda-forge
prompt_toolkit 3.0.38 hd8ed1ab_0 conda-forge
psutil 5.9.8 py310h2372a71_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pyasn1 0.5.1 pyhd8ed1ab_0 conda-forge
pycairo 1.26.0 py310hda9f760_0 conda-forge
pycosat 0.6.6 py310h2372a71_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pydantic 1.10.14 pypi_0 pypi
pydantic-core 2.16.3 py310hcb5633a_0 conda-forge
pygments 2.17.2 pyhd8ed1ab_0 conda-forge
pygobject 3.48.1 py310h30b043a_0 conda-forge
pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge
pylddwrap 1.2.2 pypi_0 pypi
pylev 1.4.0 pyhd8ed1ab_0 conda-forge
pynacl 1.5.0 py310h2372a71_3 conda-forge
pyopenssl 23.1.1 pyhd8ed1ab_0 conda-forge
pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
pytest 8.1.1 pyhd8ed1ab_0 conda-forge
pytest-dependency 0.5.1 pyh9f0ad1d_0 conda-forge
pytest-mock 3.12.0 pyhd8ed1ab_0 conda-forge
python 3.10.13 hd12c33a_1_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-graphviz 0.20.1 pyh22cad53_0 conda-forge
python-jose 3.3.0 pyh6c4a22f_1 conda-forge
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge
python_abi 3.10 4_cp310 conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pywin32-on-windows 0.1.0 pyh1179c8e_3 conda-forge
pyyaml 6.0.1 py310h2372a71_1 conda-forge
qemu 5.0.0 hb15d774_0 ucb-bar
readline 8.2 h8228510_1 conda-forge
referencing 0.30.2 pyhd8ed1ab_0 conda-forge
regex 2023.12.25 py310h2372a71_0 conda-forge
requests 2.31.0 pyhd8ed1ab_0 conda-forge
responses 0.25.0 pyhd8ed1ab_0 conda-forge
rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge
rhash 1.4.3 hd590300_2 conda-forge
riscv-tools 1.0.6 0_h1234567_g56c29e0 ucb-bar
rpds-py 0.18.0 py310hcb5633a_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
rsync 3.2.7 h70740c4_0 conda-forge
ruamel-yaml 0.17.40 pypi_0 pypi
ruamel.yaml.clib 0.2.7 py310h2372a71_2 conda-forge
s2n 1.4.0 h06160fa_0 conda-forge
s3fs 0.4.2 py_0 conda-forge
s3transfer 0.10.0 pyhd8ed1ab_0 conda-forge
sarif-om 1.0.4 pyhd8ed1ab_0 conda-forge
sbt 1.9.7 hd8ed1ab_0 conda-forge
screen 4.8.0 he28a2e2_0 conda-forge
secretstorage 3.3.3 py310hff52083_2 conda-forge
sed 4.8 he412f7d_0 conda-forge
setuptools 69.2.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
smmap 5.0.0 pyhd8ed1ab_0 conda-forge
snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge
sphinx 7.2.6 pyhd8ed1ab_0 conda-forge
sphinx-autobuild 2024.2.4 pyhd8ed1ab_0 conda-forge
sphinx_rtd_theme 2.0.0 pyha770c72_0 conda-forge
sphinxcontrib-applehelp 1.0.8 pyhd8ed1ab_0 conda-forge
sphinxcontrib-devhelp 1.0.6 pyhd8ed1ab_0 conda-forge
sphinxcontrib-htmlhelp 2.0.5 pyhd8ed1ab_0 conda-forge
sphinxcontrib-jquery 4.1 pyhd8ed1ab_0 conda-forge
sphinxcontrib-jsmath 1.0.1 pyhd8ed1ab_0 conda-forge
sphinxcontrib-qthelp 1.0.7 pyhd8ed1ab_0 conda-forge
sphinxcontrib-serializinghtml 1.1.10 pyhd8ed1ab_0 conda-forge
sqlite 3.45.2 h2c6b66d_0 conda-forge
sshpubkeys 3.3.1 pyhd8ed1ab_0 conda-forge
sty 1.0.0 pyhd8ed1ab_0 conda-forge
sure 2.0.1 pypi_0 pypi
sympy 1.12 pypyh9d50eac_103 conda-forge
sysroot_linux-64 2.17 h4a8ded7_14 conda-forge
tar 1.34 hb2e2bae_1 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
tomlkit 0.12.4 pyha770c72_0 conda-forge
toolz 0.12.1 pyhd8ed1ab_0 conda-forge
tornado 6.4 py310h2372a71_0 conda-forge
tqdm 4.66.2 pyhd8ed1ab_0 conda-forge
truststore 0.8.0 pyhd8ed1ab_0 conda-forge
types-awscrt 0.20.5 pyhd8ed1ab_0 conda-forge
types-pytz 2024.1.0.20240203 pyhd8ed1ab_0 conda-forge
types-pyyaml 6.0.12.20240311 pyhd8ed1ab_0 conda-forge
types-requests 2.31.0.6 pyhd8ed1ab_0 conda-forge
types-s3transfer 0.10.0 pypi_0 pypi
types-urllib3 1.26.25.14 pyhd8ed1ab_0 conda-forge
typing-extensions 4.10.0 hd8ed1ab_0 conda-forge
typing_extensions 4.10.0 pyha770c72_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
ukkonen 1.0.1 py310hd41b1e2_4 conda-forge
unicodedata2 15.1.0 py310h2372a71_0 conda-forge
unzip 6.0 h7f98852_3 conda-forge
urllib3 1.26.18 pyhd8ed1ab_0 conda-forge
verilator 5.022 h7cd9344_0 conda-forge
vim 9.1.0041 py310pl5321he660f0e_0 conda-forge
virtualenv 20.25.1 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge
webencodings 0.5.1 pyhd8ed1ab_2 conda-forge
websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge
werkzeug 3.0.1 pyhd8ed1ab_0 conda-forge
wget 1.20.3 ha35d2d1_1 conda-forge
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
which 2.21 h0b41bf4_1 conda-forge
wrapt 1.16.0 py310h2372a71_0 conda-forge
xmltodict 0.13.0 pyhd8ed1ab_0 conda-forge
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-inputproto 2.3.2 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.1.1 hd590300_0 conda-forge
xorg-libsm 1.2.4 h7391055_0 conda-forge
xorg-libx11 1.8.7 h8ee46fc_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-libxi 1.7.10 h7f98852_0 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-libxt 1.3.0 hd590300_1 conda-forge
xorg-libxtst 1.2.3 h7f98852_1002 conda-forge
xorg-recordproto 1.14.2 h7f98852_1002 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xxhash 0.8.0 h7f98852_3 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.4 py310h2372a71_0 conda-forge
zipp 3.17.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstandard 0.22.0 py310h1275a96_0 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
Other Setup
Followed the "setting up the repository" guide, and added this to PeripheralDeviceConfigs.scala:
class TapNICRocketConfig extends Config(
new chipyard.harness.WithSimNetwork ++
new icenet.WithIceNIC ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
Current Behavior
When I run: make -C sims/verilator CONFIG=TapNICRocketConfig VERILATOR_THREADS=1 with any number of threads larger than 1, I end up with a simulator program that dies with a SIGSEGV almost as soon as I can launch it.
As a consequence (and, to address what I'm really after), running the pingd.c test results in a ~2s ping on my system. That's longer than the default interval of a second, which means that running ping with no arguments against the single-threaded RTL simulation is an effective DOS strategy as it sends ICMP echo requests at about 2x the throughput the simulator can maintain.
Expected Behavior
Simply, I expected to be able to "throw more threads at it," as most of the time seems to be going to front-end stalls due to icache misses, something that multiple threads addresses nicely by way of expanding the effective available icache space.
More broadly, I suppose I expected there to be a way to get to workable performance of the RTL model for functional simulation of simple network nodes without custom hardware, proprietary software, or an FPGA-equipped cloud instance. Perhaps my expectation that such a path exists through verilator is worth discussing here, too?
Other Information
coredumpctl debug suggests this is because the thread-local context_t isn't fully initialized in all threads:
Core was generated by `/home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/simulator-chipya'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000056067f92ac60 in context_t::switch_to (this=0x56068194f370) at ../fesvr/context.cc:86
86 cur = this;
[Current thread is 1 (Thread 0x7fece98bb6c0 (LWP 4017918))]
(gdb) bt
#0 0x000056067f92ac60 in context_t::switch_to (this=0x56068194f370) at ../fesvr/context.cc:86
#1 0x000056067f3dda89 in network_tick ()
#2 0x000056067f555b96 in VTestDriver___024unit____Vdpiimwrap_network_tick_TOP____024unit(unsigned char, unsigned long, unsigned char, unsigned char&, unsigned long&, unsigned char&, unsigned long&) ()
#3 0x000056067f739e90 in VTestDriver___024root___nba_sequent__TOP__1899(VTestDriver___024root*) ()
#4 0x000056067f44d8b6 in VTestDriver___024root____Vthread__nba__2(void*, bool) ()
#5 0x000056067f415cb1 in VlWorkerThread::workerLoop() ()
#6 0x00007feceb8f0e95 in std::execute_native_thread_routine (__p=<optimized out>)
at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#7 0x00007feceb5da55a in ?? () from /usr/lib/libc.so.6
#8 0x00007feceb657a5c in ?? () from /usr/lib/libc.so.6
(gdb) p this
$1 = (context_t * const) 0x56068194f370
(gdb) p cur
$2 = (context_t *) 0x0
I'm not familiar with the usercontext patterns/types used by the fesvr to implement what appears to be "green threads" (really, co-routines), but I do see in the verilator docs that in multithreaded verilator it's the verilated model which creates and manages all N-1 threads except for whatever called eval:
With --threads {N}, where N is at least 2, the generated model will be designed to run in parallel on N threads. The thread calling eval() provides one of those threads, and the generated model will create and manage the other N-1 threads. ... When making frequent use of DPI imported functions in a multithreaded model, it may be beneficial to performance to adjust the --instr-count-dpi option based on some experimentation. This influences the partitioning of the model by adjusting the assumed execution time of DPI imports.
And the latter bit suggests to me that while under some conditions the DPI functions may be called from the same threads, I see no guarantees that DPI from the always blocks in a module will be called using the same thread as the initial blocks (which the current implementation implicitly assumes). I suspect that one more degree of refinement around "the model's threads ought to be more compatible with the fesvr's coroutine implementation backing the DPI SimNetwork implementation" would help here, not least in identifying whether this is even a crash that chipyard itself has any leverage over.
I'm opening this here because I believe it's the right home for this issue, since it seems that even if verilator or fesvr exposed a callback or config option that would affect the outcome there'd still need to be a change here in chipyard to take advantage of it, but I'm very new to this space and would welcome your guidance.
Oh, I also meant to mention: I found someone on the mailing list with a similar-looking symptom ( https://groups.google.com/g/chipyard/c/i0pNR4t8HFA/m/NBMP4fcsAQAJ ), but given that they reported 1) the crash occurred in what appears to be an initial block rather than the nba driving the network_tick, and 2) solving the problem by changing the name of the connected bus I suspect that is a different issue, though perhaps still somewhere in SimNetwork.cc & friends.
I wonder if adding a lock to the network__tick and network_init functions would be sufficient.
I think it depends on how you mean: I noticed the cur that's 0x0 in gdb there is a static __thread context_t* cur;; since it's a thread-local, I read that as saying the faulting thread's copy of that storage is uninitialized. No other thread can (well, should) access the faulting thread's local storage, so locking to wait for it to be constructed would probably just hang.
If you mean "instead of using a ucontext/coroutine thing, set up a cond-with-lock in network_init that parks a pthread until signaled by network_tick", then I think there's a potentially fruitful path there: there's even a bit of an example implementation in the fesvr's context_t as an alternative to using a ucontext, albeit one that doesn't look directly usable.
Between repair and replace, I'm personally leaning towards trying to figure out what the fesvr's context_t thing wants, since there's a handful of other uses of in chipyard already (e.g. the spike tile) that would either suffer from a similar issue or possibly provide a solution. I'm hoping to continue posting my notes here as I learn my way through the fesvr and verilator threading models—unless you'd rather I didn't, of course!
unless you'd rather I didn't, of course!
I suspect the solution to the problem does not require messing around in context_t. I believe several other devices uses FESVR's context_t and behave correctly in multithreaded sims.
Perhaps! I did see that other devices made use of context_t, which is what leads me to want to understand the problem a little better. I found certain --threads counts do in fact produce a simulation that works for a given model using SimNetwork; not just 1 (which always works), but sometimes the model will work indefinitely with --threads 2 and crash immediately with --threads 3.
Two especially relevant details I've noticed so far:
- The verilator multithreading model appears to schedule micro-tasks statically; i.e. the same thread always resolves the same DPI-C call for a given model
- The crash occurs inside
context_twhen the thread-localcurvariable is NULL (0x0).curusually looks like it gets initialized as a side-effect of callinginitin that thread (for any context_t instance, I think?)
I think it's the combination of these two that's causing the behavior I'm seeing: when the scheduler happens to place the network_init and network_tick DPI calls into the same thread's work queue (P=100% with one thread, ~50% with two, ~33 % with three, etc...), then there's no crash—network_init populates the thread-local, and network_tick uses it.
If I'm further right in saying that any call to static context_t::current() in a thread "pre-warms" it, then adding more independently-scheduled instances (like, say, 2x ice nics + a block device + 8 spike tiles), we might end up rapidly (but asymptotically) approaching a 100% chance that some initial block populates each thread's cur storage for a given number of threads. Which could account for (directly, or indirectly) your experience that multi-threaded sims with context_t work fine?
Hmm, well, yes and no to that last question. Adding a spike tile[^1] did perturb the scheduler enough that the simulation worked at least once[^2] with VERILATOR_THREADS=3:
[UART] UART0 is here (stdin/stdout).
network init (tid=198488)
No tap interface provided
Constructing spike processor_t (tid=198490)
Done constructing spike processor
network tick (tid=198488)
- /home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/TestDriver.v:158: Verilog $finish
and failed with VERILATOR_THREADS=4:
network init (tid=205548)
No tap interface provided
Constructing spike processor_t (tid=205550)
Done constructing spike processor
network tick (tid=205551)
zsh: segmentation fault (core dumped) ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig
But, adding/removing cores doesn't change which threads do the initialization:
Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor
And, experimenting also provided a counterexample to my speculation that any context_t::init caller in a thread would suffice:
[UART] UART0 is here (stdin/stdout).
network init (tid=192456)
No tap interface provided
Constructing spike processor_t (tid=192457)
Done constructing spike processor
network tick (tid=192457)
zsh: segmentation fault (core dumped) ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig
Also, it seems that bdev is vulnerable to the same crash (as long as the sim is run with +blkdev=somefile, otherwise the blkdev never inits or ticks):
bdev init (tid=312407)
[UART] UART0 is here (stdin/stdout).
...
bdev tick (tid=312410)
zsh: segmentation fault (core dumped) ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig +permissive
$ coredumpctl debug
...
Core was generated by `./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig +permissive +blk'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000556201ef4530 in context_t::switch_to (this=0x55620306f9e0) at ../fesvr/context.cc:86
86 cur = this;
[Current thread is 1 (Thread 0x7fcebc80c6c0 (LWP 312410))]
(gdb) bt
#0 0x0000556201ef4530 in context_t::switch_to (this=0x55620306f9e0) at ../fesvr/context.cc:86
#1 0x00005562015cecbe in block_device_tick ()
#2 0x00005562017a7206 in VTestDriver___024unit____Vdpiimwrap_block_device_tick_TOP____024unit(unsigned char, unsigned char&, unsigned char, unsigned int, unsigned int, unsigned int, unsigned char, unsigned char&, unsigned long, unsigned int, unsigned char&, unsigned char, unsigned long&, unsigned int&) ()
#3 0x00005562019a38b0 in VTestDriver___024root___nba_sequent__TOP__1888(VTestDriver___024root*) ()
#4 0x0000556201646d0f in VTestDriver___024root____Vthread__nba__2(void*, bool) ()
#5 0x000055620160ce51 in VlWorkerThread::workerLoop() ()
#6 0x00007fcebe86fe95 in std::execute_native_thread_routine (__p=<optimized out>)
at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#7 0x00007fcebe53e55a in ?? () from /usr/lib/libc.so.6
#8 0x00007fcebe5bba5c in ?? () from /usr/lib/libc.so.6
(gdb) p cur
$1 = (context_t *) 0x0
It seems like the spike tile is in fact the odd one out; since it's driven from a single DPI-C entrypoint that inits itself on demand, it's not possible(?) for it to be scheduled on to two different threads by the Verilator microtask scheduler.
[^1]: so the new config is ```scala class TapNICRocketConfig extends Config( new chipyard.WithNSpikeCores(4) ++
new chipyard.harness.WithSimNetwork ++
new icenet.WithIceNIC ++
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
```
[^2]: I'm testing with:
touch sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/filelist.f && MAKEFLAGS=-j`nproc` make -C sims/verilator VERILATOR_THREADS=N CONFIG=TapNICRocketConfig && ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig toolchains/riscv-tools/riscv-tests/build/isa/rv64ui-p-simple
for various values of N. I'm also using rv64ui-p-simple because it doesn't seem important which test program to use: whether it segfaults or not happens on the first network_tick call and is apparently independent of whatever the simulated binary does.
Ah, one speculative code change later and, oh, hello:
[UART] UART0 is here (stdin/stdout).
network init (tid=536361)
No tap interface provided
network tick (tid=536364)
- /home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/TestDriver.v:158: Verilog $finish
init and tick ran in different OS threads, and yet no crash! That result via noticing that it was a very different crash in the case that someone had previously populated the thread-local.
In reading the code to try and identify how the control flow ought to return, I noticed that both the device and the switch had a pointer to another thread's local storage (they were both storing cur in a field). So, overriding some access modifiers so I could update them from network_tick:
if (!netdev || !netsw) {
fprintf(stderr, "You forgot to call network_init!");
exit(1);
}
+ netdev->target = context_t::current();
+ netsw->main = context_t::current();
netdev->tick(out_valid, out_data, out_last);
netdev->switch_to_host();
netsw->distribute();
netsw->switch_to_worker();
and, since context_t::current exists to populate the thread-local, it seems neither crash occurred.
I'm still not quite sure what all this means yet, nor what course of action it suggests, but I did find that result very interesting.
Ok, so I've experimented a bit more, and I've come up with three potentially useful perspectives here:
- SimNetwork (& SimBlockDevice) are mis-using
context_tby stashing the pointer fromcontext_t::currentin a field during construction. That's always a pointer to a thread-local, and if the constructor is called in a different thread (as here), that gets very difficult to reason about. A change suggested by this perspective is an iteration on (but functionally the same as) "update in network tick" from above, maybe something like this indevice.h:- void switch_to_host(void) { host.switch_to(); } + void switch_to_host(void) { target = context::current(); host.switch_to(); target = NULL; } context_tis challenging to use correctly, especially in a threaded program. An idea for how to improve the surface a bit would be to promoteprevto a thread-local itself and add something like:
which would allow NetworkDevice & NetworkSwitch a way to pass control flow back to the simulation without needing to stash their own context pointers. The full fix would also require changingvoid context_t::yield() { #ifdef USE_UCONTEXT if (swapcontext(cur->context.get(), prev->context.get()) != 0) abort(); #else assert(false && "todo!"); abort(); #endif }context_t::switch_toso that it always initializes the thread-local if it's NULL (or, possibly better yet, just makecur~static __thread context_t curso it's always valid memory)verilator's threading model is already providing (effectively) stackless (micro-)tasks that can be efficiently distributed to threads.context_tdoes provide stackful coroutines, but the gain from usingcontext_tmight be relatively small for its cost in complexity (and the implicit syscall to set the signal mask every swap). The direction suggested here would be to refactor thetickfunctions to return after a single step (i.e. on the current loop back-edge); to my eye that looks fairly achievable for the NetworkDevice & NetworkSwitch after shuffling a few stack-locals to become class members or shim'd-in arguments. BlockDevice would be even simpler, and I don't see any other uses offesvr/context.hin my clone of the chipyard repo at this time (although I only have the default list of submodules from./build-setup.sh).
I'd be lying if I said I didn't see the last as the most pragmatic option. I also feel like it's a bit of of a loss: I've really enjoyed learning about the ucontext_t stuff, and I appreciate the utility of being able to multiplex lightweight user tasks over the same thread. That said, I spend a lot of time learning about weird corners of computing, and it was still very strange to me on first encounter. That to me is an important signal about code accessibility, and for the same reason non-local control flow is... well, best used sparingly.
I believe I can fill out any of those three directions into a full-fledged PR to the upstream(s) in question here (IceNet & testchipip, or riscv-isa-sim). Do any of them especially call to you, @jerryz123 ? I see that you've done a lot of this work, so I suspect you'd have a better sense for the overall context (pun intended).
SimNetwork (& SimBlockDevice) are mis-using context_t by stashing the pointer from context_t::current in a field during construction. That's always a pointer to a thread-local, and if the constructor is called in a different thread (as here), that gets very difficult to reason about.
While I agree with your reasoning here, I don't think its reasonable to expect/require SimDevice implementations to be thread-safe, where the constructor/tick functions can be called from distinct threads. As far as I can tell, this quirk only appears with Verilator multi-threading. The other simulators don't do this, even with multithreading enabled.
context_t is challenging to use correctly, especially in a threaded program. An idea for how to improve the surface a bit would be to promote prev to a thread-local itself and add something like:
Does this generalize to systems with multiple contexts? IMO its better to require the programmer to explicitly specify the next context to execute. There are use-cases of context_t which have multiple contexts (not just target/host).
The direction suggested here would be to refactor the tick functions to return after a single step
The htif/tsi mechanism uses context_t, but I believe the implementation is buried with the static FESVR library, which is compiled as part of spike (Spike uses htif and context_t as well in its own simulation loop).
The FireSim FPGA emulation driver also heavily uses context_t.
Another example is the tick function for SpikeTile, which allows the Spike C++ core model to interact with the Chipyard RTL simulation. https://github.com/ucb-bar/chipyard/blob/eb6910aae00bcbad9b8f09fe40d0bc419fe42cbf/generators/chipyard/src/main/resources/csrc/spiketile.cc#L180
I couldn't think of a way to make that system work without context_t.
My belief here is that the init/tick functions being called from separate threads is a Verilator-specific quirk that we should work around with minimal disruption to existing other code/interfaces. Perhaps the simplest thing is to merge network_tick and network_init, and make the tick function initialize the devices on-demand?
Thank you for digging into this, I've learned quite a bit about the subtleties of the context_t behavior in a multi-threaded system from your analysis.
While I agree with your reasoning here, I don't think its reasonable to expect/require SimDevice implementations to be thread-safe, where the constructor/tick functions can be called from distinct threads.
Yeah, I hear you about not wanting the {runtime,complexity} overhead of generalized thread-safety in the simulated devices. I want to note that the situation here calls for a much narrower "kind" of thread safety—it's the difference between what Rust calls Sync (you might be called from multiple threads at the same time) and the much simpler Send (it's safe to move the resource between threads)[^1][^2]. There is a strict happens-before relationship between the verilog inital blocks that call init and the always blocks that call tick, which it appears that verilator correctly implements. So to achieve correctness here it's not that the sim needs to handle multiple concurrent callers, but more or less just needs to avoid stashing a reference to another thread's local storage.
[^1]: It's fairly Rust-jaron-rich, but the rust user forums have a good discussion about what being !Send + Sync means.
[^2]: The only other non-experiential citation I have for this is the Rustonomicon, which unhelpfully states "A type is Send if it is safe to send it to another thread," and seems to mistake its own premise further down (filed as https://github.com/rust-lang/nomicon/issues/453 , for anyone reading that desires homework from a footnote).
As far as I can tell, this quirk only appears with Verilator multi-threading. The other simulators don't do this, even with multithreading enabled.
To be fully transparent, I have only a few dozen hours' worth of experience with any of the commercial verilog implementations, and none to the depth I've gotten here with verilator. I do wonder if it's on accident or by design that the other simulators don't encounter this behavior: do you know if there's some verilog standard (implicit or explicit) that verilator is violating here by evaluating the initial block in a different thread than the always block? If it should be treating the module, say, as the "unit" of work (perhaps iff the module contains DPI?), that's something it might be worth raising with them upstream, too.
Does this generalize to systems with multiple contexts? IMO its better to require the programmer to explicitly specify the next context to execute. There are use-cases of context_t which have multiple contexts (not just target/host).
The yeild semantics I implemented do generalize in the sense that every context_t has a most-recently-swapped antecedent, but as written would probably produce surprising behavior when trying to "nest" contexts (M switch_to A switch_to B followed by yield would "return" to B, but a second yield from A would pass control flow back to B). We could imagine a context "stack", with switch_to pushing a task, and a definition of yield that acts as a "pop." I believe that would work to implement arbitrarily nested contexts just fine, although there's other simple solutions too when the number of tasks is small and statically fixed, as I think is the case here[^3].
[^3]: If you're curious, I have an example that I'm playing with: https://github.com/sethp/ucontext-coroutine . I haven't gotten to threading just yet, and I suspect my implementation is broken even for tail-recursive / single-recursion cases, but it's been enlightening.
My belief here is that the init/tick functions being called from separate threads is a Verilator-specific quirk that we should work around with minimal disruption to existing other code/interfaces.
I appreciate the examples! I'm glad to have the benefit of your experience here. I'd agree at this point that "avoid use of context_t entirely" is a path not worth further exploration.
Unfortunately, I'm not sure there's a general resolution that doesn't at least involve at least looking at the other use sites: neither threading nor non-local control flow are famous for composing well. I haven't identified any answers that reside entirely within context_t or the verilated main or some other high-leverage point that would span all devices (at least, not yet).
Perhaps the simplest thing is to merge network_tick and network_init, and make the tick function initialize the devices on-demand?
It's a good idea, that's how it seems the spike tile (and, perhaps, htif?) gets away with using context_t when verilated as a multi-threaded model. I considered it, but I didn't bring it up, because it has some immediate consequences that I presumed would be disqualifying (all the _tick interfaces would have to take all the _init parameters, for example).
I'm also not entirely sure how durable it would be, as a solution: the verilator documentation on task scheduling suggests that they tried both static and dynamic scheduling and went for the static for performance (rather than correctness) reasons. I suspect a dynamically scheduled runtime (e.g. one based on work-stealing) would probably cause even spiketile.cc to spontaneously fail, as it got moved around between threads.
My guess is that's a decision that's unlikely to be reversed any time soon ("efficient dynamic scheduling" notwithstanding), and you're in the much better position than I to know if "all DPI-C that uses context_t has a single verilog-facing entrypoint" is an invariant you feel is more maintainable.
I think my plan at this point is to continue experimenting with ucontexts to get a better understanding of what it means to nest them (& how else they're used by htif & friends), and whether there's somewhere besides a thread-local to pass task-local sideband data to try and break the coupling there.
Pursuing a definition of a task that didn't care what thread it was scheduled on (as long as it wasn't scheduled more than once) seems like it offers a resolution that's relatively low-impact and high-durability to me.
Thank you for digging into this, I've learned quite a bit about the subtleties of the context_t behavior in a multi-threaded system from your analysis.
Thank you for reading, and for the feedback! I'm glad you've found it helpful, I've very much enjoyed learning about all these fine details as well :smile:
I think I finally understand the problem here well enough that I feel confident about what's happening. Much of the clarity came when I got curious about the question "why is assigning target = context_t::current() not sending us back to the initial block when we target->switch_to()?". I wrote a small little sample program to investigate^sample, but the short version is that the referent of current() is not stable, it's (sometimes) internally mutated as a side effect of calling ::switch_to().
I found that even in single threaded mode, target->switch_to() took me back to a surprising point—it only "resumes" the target simulation when the implicit second parameter to swapcontext (via the cur thread local) points to the same referent as we captured during initialization.
I understand the desire to move responsibility for that to the verilog implementation, but I don't see how to do so effectively. In this case there's three non-lexical scopes that all need to line up (the thread local, the init, and the tick), but since context_t::current() could point to any user-allocated structure (not just the anonymous thread-local one), and be captured behind any DPI-C call, any boundary we draw here feels somewhat arbitrary to me. And, "do all of the simulators agree about what is the atomic unit of thread-binding, and does that cover every deferred reference to context_t::current() is a much harder property to pattern match on than "does this switch_to call immediately update something with context_t::current() just prior?"
So, I'd like to pitch a three step plan:
- Repair the network device by updating
target = context_t::current()just before callinghost.switch_to()innetdev(& similar for thenetsw), as mentioned above. This ensures the invariant thattargetalways points to the task we're about to park, and therefore works across time & threads both. - Identify some other usages of
context_tand evaluate a similar repair. A quick grep suggests there's on the order of a half dozen or so usages ofcontext_tin chipyard, so repair should only take a few days' of effort and can be incremental—IMO it's ok for this step to be best-effort, because one way identification works is "someone reports an issue about a segfault". - Potentially, revisit the idea to invent a new semantic to be more explicit about the referent we want to update as a side effect of the
switch_to—how often this pattern appears suggests to me that it is indeed something worth looking into, andcontext_t::current()may even be worth deprecating, since capturing its result is misuse-prone.
What do you think? Would you be willing to accept a change like 1 and piecemeal updates for 2?