torchchat
torchchat copied to clipboard
AOTI/DSO model does not run in Linux
🐛 Describe the bug
I am running an Arch Linux system with a 4090/3090 w/ and up-to-date CUDA 12.5 (Build cuda_12.5.r12.5/compiler.34385749_0
)
I have created a new mamba env for torchchat and run the install. Regular inferencing (eg with generate
) works fine.
I compile an AOTI model per the README:
❯ time python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torchao/ops.py:12: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
return torch.library.impl_abstract(f"{name}")(func)
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.4.0 available.
Using device=cuda
Loading model...
Time to load model: 2.54 seconds
-----------------------------------------------------------
Exporting model using AOT Inductor to /home/local/torchchat/exportedModels/llama3.1.so
W0802 22:25:40.607000 126075654027072 torch/fx/experimental/symbolic_shapes.py:4449] xindex is not in var_ranges, defaulting to unknown range.
In file included from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef.h:631,
from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/DeviceGuard.h:3,
from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/ATen.h:9,
from /home/local/torchchat/exportedModels/ca5ydbysfhhoy7a5vyb5c26c642lglqngoqmpxtzrmq77e6kbqqx.cpp:443:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h: In static member function ‘static c10::detail::IListRefConstRef<at::OptionalTensorRef> c10::detail::IListRefTagImpl<c10::IListRefTag::Boxed, at::OptionalTensorRef>::iterator_get(const c10::List<std::optional<at::Tensor> >::const_iterator&)’:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h:171:17: warning: possibly dangling reference to a temporary [-Wdangling-reference]
171 | const auto& ivalue = (*it).get();
| ^~~~~~
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h:171:35: note: the temporary was destroyed at the end of the full expression ‘(& it)->c10::impl::ListIterator<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::get()’
171 | const auto& ivalue = (*it).get();
| ~~~~~~~~~^~
In file included from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/OperatorEntry.h:12,
from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:6,
from /home/local/torchchat/exportedModels/ca5ydbysfhhoy7a5vyb5c26c642lglqngoqmpxtzrmq77e6kbqqx.cpp:444:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h: In lambda function:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h:154:32: warning: possibly dangling reference to a temporary [-Wdangling-reference]
154 | for (const at::Tensor& tensor : ivalue.toTensorList()) {
| ^~~~~~
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h:154:61: note: the temporary was destroyed at the end of the full expression ‘__for_begin .c10::impl::ListIterator<at::Tensor, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<at::Tensor, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator std::conditional_t<true, const at::Tensor&, at::Tensor>()’
154 | for (const at::Tensor& tensor : ivalue.toTensorList()) {
| ^
The generated DSO model can be found at: /home/local/torchchat/exportedModels/llama3.1.so
real 2m2.058s
user 1m24.277s
sys 0m39.165s
When I try to run with the exported DSO model it gives an error:
python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --prompt "Hello my name is"
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torchao/ops.py:12: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
return torch.library.impl_abstract(f"{name}")(func)
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.4.0 available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=cuda NVIDIA GeForce RTX 4090
Loading model...
Time to load model: 2.65 seconds
Error: CUDA error: out of memory
Traceback (most recent call last):
File "/home/local/torchchat/build/builder.py", line 468, in _initialize_model
model.forward = torch._export.aot_load(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/_export/__init__.py", line 425, in aot_load
runner = torch._C._aoti.AOTIModelContainerRunnerCuda(so_path, 1, device) # type: ignore[assignment, call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: create_func_( &container_handle_, num_models, device_str.c_str(), cubin_dir.empty() ? nullptr : cubin_dir.c_str()) API call failed at ../torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 49
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/local/torchchat/torchchat.py", line 88, in <module>
generate_main(args)
File "/home/local/torchchat/generate.py", line 838, in main
gen = Generator(
^^^^^^^^^^
File "/home/local/torchchat/generate.py", line 205, in __init__
self.model = _initialize_model(self.builder_args, self.quantize, self.tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/local/torchchat/build/builder.py", line 472, in _initialize_model
raise RuntimeError(f"Failed to load AOTI compiled {builder_args.dso_path}")
RuntimeError: Failed to load AOTI compiled exportedModels/llama3.1.so
I tried the C++ runner as well but it fails to build:
❯ scripts/build_native.sh aoti
+ '[' 1 -eq 0 ']'
+ (( 1 ))
+ case "$1" in
+ echo 'Building aoti native runner...'
Building aoti native runner...
+ TARGET=aoti
+ shift
+ (( 0 ))
+ '[' -z '' ']'
+++ dirname scripts/build_native.sh
++ cd scripts
++ pwd -P
+ SCRIPT_PATH=/home/local/torchchat/scripts
++ dirname /home/local/torchchat/scripts
+ TORCHCHAT_ROOT=/home/local/torchchat
+ '[' -z '' ']'
+ ET_BUILD_DIR=et-build
+ source /home/local/torchchat/scripts/install_utils.sh
++ set -ex pipefail
++ COMMON_CMAKE_ARGS=' -DCMAKE_BUILD_TYPE=Release -DEXECUTORCH_ENABLE_LOGGING=ON -DEXECUTORCH_LOG_LEVEL=Info -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON -DEXECUTORCH_BUILD_XNNPACK=ON'
+ pushd /home/local/torchchat
~/torchchat ~/torchchat
+ git submodule update --init
Submodule 'tokenizer/third-party/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'tokenizer/third-party/abseil-cpp'
Submodule 'tokenizer/third-party/re2' (https://github.com/google/re2.git) registered for path 'tokenizer/third-party/re2'
Submodule 'tokenizer/third-party/sentencepiece' (https://github.com/google/sentencepiece.git) registered for path 'tokenizer/third-party/sentencepiece'
Cloning into '/home/local/torchchat/tokenizer/third-party/abseil-cpp'...
Cloning into '/home/local/torchchat/tokenizer/third-party/re2'...
Cloning into '/home/local/torchchat/tokenizer/third-party/sentencepiece'...
Submodule path 'tokenizer/third-party/abseil-cpp': checked out '854193071498f330b71083d7e06a7cd18e02a4cc'
Submodule path 'tokenizer/third-party/re2': checked out 'ac82d4f628a2045d89964ae11c48403d3b091af1'
Submodule path 'tokenizer/third-party/sentencepiece': checked out '7dcb541451b1862d73f473b3804ccf8f2a9e10f6'
+ git submodule sync
Synchronizing submodule url for 'tokenizer/third-party/abseil-cpp'
Synchronizing submodule url for 'tokenizer/third-party/re2'
Synchronizing submodule url for 'tokenizer/third-party/sentencepiece'
+ [[ aoti == \e\t ]]
+ popd
~/torchchat
+ [[ aoti == \e\t ]]
++ python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'
+ cmake -S . -B ./cmake-out -DCMAKE_PREFIX_PATH=/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -G Ninja
-- The C compiler identification is GNU 14.1.1
-- The CXX compiler identification is GNU 14.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17 - Success
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX20
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX20 - Failed
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Warning at tokenizer/third-party/abseil-cpp/CMakeLists.txt:193 (message):
The default and system-level install directories are unsupported except in LTS releases of Abseil. Please set CMAKE_INSTALL_PREFIX to install Abseil in your source or build tree directly.
CMake Deprecation Warning at tokenizer/third-party/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- VERSION: 0.2.1
-- Found TCMalloc: /usr/lib/libtcmalloc_minimal.so
-- Using ET BUILD DIR: --[et-build]--
-- TORCHCHAT_ROOT="/home/local/torchchat"
-- Looking for excutorch in /home/local/torchchat/et-build/install
-- Could NOT find executorch (missing: executorch_DIR)
CMake Warning at runner/et.cmake:130 (MESSAGE):
ExecuTorch package not found
Call Stack (most recent call first):
CMakeLists.txt:15 (include)
CMake Warning (dev) at runner/aoti.cmake:16 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
Call Stack (most recent call first):
CMakeLists.txt:21 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found CUDA: /opt/cuda (found version "12.5")
-- Found CUDA: /opt/cuda (found version "12.5")
-- The CUDA compiler identification is NVIDIA 12.5.82
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /opt/cuda/include (found version "12.5.82")
-- Caffe2: CUDA detected: 12.5
-- Caffe2: CUDA nvcc is: /opt/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /opt/cuda
-- Caffe2: Header version is: 12.5
-- /opt/cuda/lib/libnvrtc.so shorthash is a50b0e02
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s): 8.9 8.6
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_86,code=sm_86
CMake Warning at /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
runner/aoti.cmake:18 (find_package)
CMakeLists.txt:21 (include)
-- Found Torch: /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/lib/libtorch.so (Required is at least version "2.4.0")
-- Configuring done (4.2s)
-- Generating done (0.1s)
-- Build files have been written to: /home/local/torchchat/cmake-out
+ cmake --build ./cmake-out --target aoti_run
[63/222] Building CXX object tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o
FAILED: tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o
/usr/bin/c++ -I/home/local/torchchat/tokenizer -I/home/local/torchchat/tokenizer/third-party/sentencepiece/src -I/home/local/torchchat/tokenizer/third-party/re2 -I/home/local/torchchat/tokenizer/third-party/abseil-cpp -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -MF tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o.d -o tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -c /home/local/torchchat/tokenizer/tiktoken.cpp
In file included from /home/local/torchchat/tokenizer/tiktoken.cpp:18:
/home/local/torchchat/tokenizer/base64.h:37:11: error: ‘uint32_t’ does not name a type
37 | constexpr uint32_t DECODE_TABLE[] = {
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:29:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
28 | #include <string>
+++ |+#include <cstdint>
29 | #include <string_view>
/home/local/torchchat/tokenizer/base64.h:57:13: error: variable or field ‘validate’ declared void
57 | inline void validate(uint32_t v) {
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:57:22: error: ‘uint32_t’ was not declared in this scope
57 | inline void validate(uint32_t v) {
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:57:22: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:70:3: error: ‘uint32_t’ was not declared in this scope
70 | uint32_t val = 0;
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:70:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:72:3: error: ‘uint8_t’ was not declared in this scope
72 | uint8_t c = input[0];
| ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:72:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:73:12: error: ‘DECODE_TABLE’ was not declared in this scope
73 | auto v = DECODE_TABLE[c];
| ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:73:25: error: ‘c’ was not declared in this scope
73 | auto v = DECODE_TABLE[c];
| ^
/home/local/torchchat/tokenizer/base64.h:74:3: error: ‘validate’ was not declared in this scope
74 | validate(v);
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:75:3: error: ‘val’ was not declared in this scope
75 | val = v;
| ^~~
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode_1_padding(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:105:3: error: ‘uint32_t’ was not declared in this scope
105 | uint32_t val = 0;
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:105:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:107:3: error: ‘uint8_t’ was not declared in this scope
107 | uint8_t c = input[0];
| ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:107:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:108:12: error: ‘DECODE_TABLE’ was not declared in this scope
108 | auto v = DECODE_TABLE[c];
| ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:108:25: error: ‘c’ was not declared in this scope
108 | auto v = DECODE_TABLE[c];
| ^
/home/local/torchchat/tokenizer/base64.h:109:3: error: ‘validate’ was not declared in this scope
109 | validate(v);
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:110:3: error: ‘val’ was not declared in this scope
110 | val = v;
| ^~~
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode_2_padding(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:131:3: error: ‘uint32_t’ was not declared in this scope
131 | uint32_t val = 0;
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:131:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:133:3: error: ‘uint8_t’ was not declared in this scope
133 | uint8_t c = input[0];
| ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:133:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:134:12: error: ‘DECODE_TABLE’ was not declared in this scope
134 | auto v = DECODE_TABLE[c];
| ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:134:25: error: ‘c’ was not declared in this scope
134 | auto v = DECODE_TABLE[c];
| ^
/home/local/torchchat/tokenizer/base64.h:135:3: error: ‘validate’ was not declared in this scope
135 | validate(v);
| ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:136:3: error: ‘val’ was not declared in this scope
136 | val = v;
| ^~~
[96/222] Building CXX object CMakeFiles/aoti_run.dir/runner/run.cpp.o
ninja: build stopped: subcommand failed.
Versions
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Arch Linux (x86_64)
GCC version: (GCC) 14.1.1 20240720
Clang version: 18.1.8
CMake version: version 3.30.1
Libc version: glibc-2.40
Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-6.10.0-arch1-2-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: 12.5.82
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 4090
Nvidia driver version: 555.58.02
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.9.2.1
/usr/lib/libcudnn_adv.so.9.2.1
/usr/lib/libcudnn_cnn.so.9.2.1
/usr/lib/libcudnn_engines_precompiled.so.9.2.1
/usr/lib/libcudnn_engines_runtime_compiled.so.9.2.1
/usr/lib/libcudnn_graph.so.9.2.1
/usr/lib/libcudnn_heuristic.so.9.2.1
/usr/lib/libcudnn_ops.so.9.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 5950X 16-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 69%
CPU max MHz: 5083.3979
CPU min MHz: 2200.0000
BogoMIPS: 6802.30
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
L1d cache: 512 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 8 MiB (16 instances)
L3 cache: 64 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-triton==3.0.0+dedb7bdf33
[pip3] torch==2.4.0
[pip3] torchao==0.3.1
[pip3] torchaudio==2.4.0
[pip3] torchvideo==0.0.0
[pip3] triton==3.0.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] pytorch-triton 3.0.0+dedb7bdf33 pypi_0 pypi
[conda] torch 2.4.0 pypi_0 pypi
[conda] torchao 0.3.1 pypi_0 pypi
[conda] torchaudio 2.4.0 pypi_0 pypi
[conda] torchvideo 0.0.0 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi