torchchat AOTI/DSO model does not run in Linux

AOTI/DSO model does not run in Linux

Open lhl opened this issue 6 months ago • 3 comments

🐛 Describe the bug

I am running an Arch Linux system with a 4090/3090 w/ and up-to-date CUDA 12.5 (Build cuda_12.5.r12.5/compiler.34385749_0)

I have created a new mamba env for torchchat and run the install. Regular inferencing (eg with generate) works fine.

I compile an AOTI model per the README:

❯ time python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torchao/ops.py:12: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  return torch.library.impl_abstract(f"{name}")(func)
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.4.0 available.
Using device=cuda
Loading model...
Time to load model: 2.54 seconds
-----------------------------------------------------------
Exporting model using AOT Inductor to /home/local/torchchat/exportedModels/llama3.1.so
W0802 22:25:40.607000 126075654027072 torch/fx/experimental/symbolic_shapes.py:4449] xindex is not in var_ranges, defaulting to unknown range.
In file included from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef.h:631,
                 from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/DeviceGuard.h:3,
                 from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/ATen.h:9,
                 from /home/local/torchchat/exportedModels/ca5ydbysfhhoy7a5vyb5c26c642lglqngoqmpxtzrmq77e6kbqqx.cpp:443:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h: In static member function ‘static c10::detail::IListRefConstRef<at::OptionalTensorRef> c10::detail::IListRefTagImpl<c10::IListRefTag::Boxed, at::OptionalTensorRef>::iterator_get(const c10::List<std::optional<at::Tensor> >::const_iterator&)’:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h:171:17: warning: possibly dangling reference to a temporary [-Wdangling-reference]
  171 |     const auto& ivalue = (*it).get();
      |                 ^~~~~~
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/IListRef_inl.h:171:35: note: the temporary was destroyed at the end of the full expression ‘(& it)->c10::impl::ListIterator<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::get()’
  171 |     const auto& ivalue = (*it).get();
      |                          ~~~~~~~~~^~
In file included from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/OperatorEntry.h:12,
                 from /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:6,
                 from /home/local/torchchat/exportedModels/ca5ydbysfhhoy7a5vyb5c26c642lglqngoqmpxtzrmq77e6kbqqx.cpp:444:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h: In lambda function:
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h:154:32: warning: possibly dangling reference to a temporary [-Wdangling-reference]
  154 |         for (const at::Tensor& tensor : ivalue.toTensorList()) {
      |                                ^~~~~~
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/include/ATen/core/dispatch/DispatchKeyExtractor.h:154:61: note: the temporary was destroyed at the end of the full expression ‘__for_begin .c10::impl::ListIterator<at::Tensor, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<at::Tensor, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator std::conditional_t<true, const at::Tensor&, at::Tensor>()’
  154 |         for (const at::Tensor& tensor : ivalue.toTensorList()) {
      |                                                             ^
The generated DSO model can be found at: /home/local/torchchat/exportedModels/llama3.1.so

real    2m2.058s
user    1m24.277s
sys     0m39.165s

When I try to run with the exported DSO model it gives an error:

 python3 torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --prompt "Hello my name is"
/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torchao/ops.py:12: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  return torch.library.impl_abstract(f"{name}")(func)
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.4.0 available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=cuda NVIDIA GeForce RTX 4090
Loading model...
Time to load model: 2.65 seconds
Error: CUDA error: out of memory
Traceback (most recent call last):
  File "/home/local/torchchat/build/builder.py", line 468, in _initialize_model
    model.forward = torch._export.aot_load(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/_export/__init__.py", line 425, in aot_load
    runner = torch._C._aoti.AOTIModelContainerRunnerCuda(so_path, 1, device)  # type: ignore[assignment, call-arg]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: create_func_( &container_handle_, num_models, device_str.c_str(), cubin_dir.empty() ? nullptr : cubin_dir.c_str()) API call failed at ../torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 49

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/local/torchchat/torchchat.py", line 88, in <module>
    generate_main(args)
  File "/home/local/torchchat/generate.py", line 838, in main
    gen = Generator(
          ^^^^^^^^^^
  File "/home/local/torchchat/generate.py", line 205, in __init__
    self.model = _initialize_model(self.builder_args, self.quantize, self.tokenizer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/torchchat/build/builder.py", line 472, in _initialize_model
    raise RuntimeError(f"Failed to load AOTI compiled {builder_args.dso_path}")
RuntimeError: Failed to load AOTI compiled exportedModels/llama3.1.so

I tried the C++ runner as well but it fails to build:

❯ scripts/build_native.sh aoti
+ '[' 1 -eq 0 ']'
+ ((  1  ))
+ case "$1" in
+ echo 'Building aoti native runner...'
Building aoti native runner...
+ TARGET=aoti
+ shift
+ ((  0  ))
+ '[' -z '' ']'
+++ dirname scripts/build_native.sh
++ cd scripts
++ pwd -P
+ SCRIPT_PATH=/home/local/torchchat/scripts
++ dirname /home/local/torchchat/scripts
+ TORCHCHAT_ROOT=/home/local/torchchat
+ '[' -z '' ']'
+ ET_BUILD_DIR=et-build
+ source /home/local/torchchat/scripts/install_utils.sh
++ set -ex pipefail
++ COMMON_CMAKE_ARGS='    -DCMAKE_BUILD_TYPE=Release     -DEXECUTORCH_ENABLE_LOGGING=ON     -DEXECUTORCH_LOG_LEVEL=Info     -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON     -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON     -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON     -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON     -DEXECUTORCH_BUILD_XNNPACK=ON'
+ pushd /home/local/torchchat
~/torchchat ~/torchchat
+ git submodule update --init
Submodule 'tokenizer/third-party/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'tokenizer/third-party/abseil-cpp'
Submodule 'tokenizer/third-party/re2' (https://github.com/google/re2.git) registered for path 'tokenizer/third-party/re2'
Submodule 'tokenizer/third-party/sentencepiece' (https://github.com/google/sentencepiece.git) registered for path 'tokenizer/third-party/sentencepiece'
Cloning into '/home/local/torchchat/tokenizer/third-party/abseil-cpp'...
Cloning into '/home/local/torchchat/tokenizer/third-party/re2'...
Cloning into '/home/local/torchchat/tokenizer/third-party/sentencepiece'...
Submodule path 'tokenizer/third-party/abseil-cpp': checked out '854193071498f330b71083d7e06a7cd18e02a4cc'
Submodule path 'tokenizer/third-party/re2': checked out 'ac82d4f628a2045d89964ae11c48403d3b091af1'
Submodule path 'tokenizer/third-party/sentencepiece': checked out '7dcb541451b1862d73f473b3804ccf8f2a9e10f6'
+ git submodule sync
Synchronizing submodule url for 'tokenizer/third-party/abseil-cpp'
Synchronizing submodule url for 'tokenizer/third-party/re2'
Synchronizing submodule url for 'tokenizer/third-party/sentencepiece'
+ [[ aoti == \e\t ]]
+ popd
~/torchchat
+ [[ aoti == \e\t ]]
++ python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'
+ cmake -S . -B ./cmake-out -DCMAKE_PREFIX_PATH=/home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake -DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 -G Ninja
-- The C compiler identification is GNU 14.1.1
-- The CXX compiler identification is GNU 14.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17 - Success
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX20
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX20 - Failed
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Warning at tokenizer/third-party/abseil-cpp/CMakeLists.txt:193 (message):
    The default and system-level install directories are unsupported except in LTS   releases of Abseil.  Please set CMAKE_INSTALL_PREFIX to install Abseil in your   source or build tree directly.


CMake Deprecation Warning at tokenizer/third-party/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- VERSION: 0.2.1
-- Found TCMalloc: /usr/lib/libtcmalloc_minimal.so
-- Using ET BUILD DIR: --[et-build]--
-- TORCHCHAT_ROOT="/home/local/torchchat"
-- Looking for excutorch in /home/local/torchchat/et-build/install
-- Could NOT find executorch (missing: executorch_DIR)
CMake Warning at runner/et.cmake:130 (MESSAGE):
  ExecuTorch package not found
Call Stack (most recent call first):
  CMakeLists.txt:15 (include)


CMake Warning (dev) at runner/aoti.cmake:16 (find_package):
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

Call Stack (most recent call first):
  CMakeLists.txt:21 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found CUDA: /opt/cuda (found version "12.5")
-- Found CUDA: /opt/cuda (found version "12.5")
-- The CUDA compiler identification is NVIDIA 12.5.82
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /opt/cuda/include (found version "12.5.82")
-- Caffe2: CUDA detected: 12.5
-- Caffe2: CUDA nvcc is: /opt/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /opt/cuda
-- Caffe2: Header version is: 12.5
-- /opt/cuda/lib/libnvrtc.so shorthash is a50b0e02
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s):  8.9 8.6
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_86,code=sm_86
CMake Warning at /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
  runner/aoti.cmake:18 (find_package)
  CMakeLists.txt:21 (include)


-- Found Torch: /home/local/.conda/envs/torchchat/lib/python3.11/site-packages/torch/lib/libtorch.so (Required is at least version "2.4.0")
-- Configuring done (4.2s)
-- Generating done (0.1s)
-- Build files have been written to: /home/local/torchchat/cmake-out
+ cmake --build ./cmake-out --target aoti_run
[63/222] Building CXX object tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o
FAILED: tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o
/usr/bin/c++  -I/home/local/torchchat/tokenizer -I/home/local/torchchat/tokenizer/third-party/sentencepiece/src -I/home/local/torchchat/tokenizer/third-party/re2 -I/home/local/torchchat/tokenizer/third-party/abseil-cpp -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -MF tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o.d -o tokenizer/CMakeFiles/tokenizer.dir/tiktoken.cpp.o -c /home/local/torchchat/tokenizer/tiktoken.cpp
In file included from /home/local/torchchat/tokenizer/tiktoken.cpp:18:
/home/local/torchchat/tokenizer/base64.h:37:11: error: ‘uint32_t’ does not name a type
   37 | constexpr uint32_t DECODE_TABLE[] = {
      |           ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:29:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
   28 | #include <string>
  +++ |+#include <cstdint>
   29 | #include <string_view>
/home/local/torchchat/tokenizer/base64.h:57:13: error: variable or field ‘validate’ declared void
   57 | inline void validate(uint32_t v) {
      |             ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:57:22: error: ‘uint32_t’ was not declared in this scope
   57 | inline void validate(uint32_t v) {
      |                      ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:57:22: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:70:3: error: ‘uint32_t’ was not declared in this scope
   70 |   uint32_t val = 0;
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:70:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:72:3: error: ‘uint8_t’ was not declared in this scope
   72 |   uint8_t c = input[0];
      |   ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:72:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:73:12: error: ‘DECODE_TABLE’ was not declared in this scope
   73 |   auto v = DECODE_TABLE[c];
      |            ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:73:25: error: ‘c’ was not declared in this scope
   73 |   auto v = DECODE_TABLE[c];
      |                         ^
/home/local/torchchat/tokenizer/base64.h:74:3: error: ‘validate’ was not declared in this scope
   74 |   validate(v);
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:75:3: error: ‘val’ was not declared in this scope
   75 |   val = v;
      |   ^~~
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode_1_padding(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:105:3: error: ‘uint32_t’ was not declared in this scope
  105 |   uint32_t val = 0;
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:105:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:107:3: error: ‘uint8_t’ was not declared in this scope
  107 |   uint8_t c = input[0];
      |   ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:107:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:108:12: error: ‘DECODE_TABLE’ was not declared in this scope
  108 |   auto v = DECODE_TABLE[c];
      |            ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:108:25: error: ‘c’ was not declared in this scope
  108 |   auto v = DECODE_TABLE[c];
      |                         ^
/home/local/torchchat/tokenizer/base64.h:109:3: error: ‘validate’ was not declared in this scope
  109 |   validate(v);
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:110:3: error: ‘val’ was not declared in this scope
  110 |   val = v;
      |   ^~~
/home/local/torchchat/tokenizer/base64.h: In function ‘void base64::detail::decode_2_padding(const std::string_view&, std::string&)’:
/home/local/torchchat/tokenizer/base64.h:131:3: error: ‘uint32_t’ was not declared in this scope
  131 |   uint32_t val = 0;
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:131:3: note: ‘uint32_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:133:3: error: ‘uint8_t’ was not declared in this scope
  133 |   uint8_t c = input[0];
      |   ^~~~~~~
/home/local/torchchat/tokenizer/base64.h:133:3: note: ‘uint8_t’ is defined in header ‘<cstdint>’; this is probably fixable by adding ‘#include <cstdint>’
/home/local/torchchat/tokenizer/base64.h:134:12: error: ‘DECODE_TABLE’ was not declared in this scope
  134 |   auto v = DECODE_TABLE[c];
      |            ^~~~~~~~~~~~
/home/local/torchchat/tokenizer/base64.h:134:25: error: ‘c’ was not declared in this scope
  134 |   auto v = DECODE_TABLE[c];
      |                         ^
/home/local/torchchat/tokenizer/base64.h:135:3: error: ‘validate’ was not declared in this scope
  135 |   validate(v);
      |   ^~~~~~~~
/home/local/torchchat/tokenizer/base64.h:136:3: error: ‘val’ was not declared in this scope
  136 |   val = v;
      |   ^~~
[96/222] Building CXX object CMakeFiles/aoti_run.dir/runner/run.cpp.o
ninja: build stopped: subcommand failed.

Versions

Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 14.1.1 20240720
Clang version: 18.1.8
CMake version: version 3.30.1
Libc version: glibc-2.40

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-6.10.0-arch1-2-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: 12.5.82
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 555.58.02
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.9.2.1
/usr/lib/libcudnn_adv.so.9.2.1
/usr/lib/libcudnn_cnn.so.9.2.1
/usr/lib/libcudnn_engines_precompiled.so.9.2.1
/usr/lib/libcudnn_engines_runtime_compiled.so.9.2.1
/usr/lib/libcudnn_graph.so.9.2.1
/usr/lib/libcudnn_heuristic.so.9.2.1
/usr/lib/libcudnn_ops.so.9.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 9 5950X 16-Core Processor
CPU family:                           25
Model:                                33
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             0
Frequency boost:                      enabled
CPU(s) scaling MHz:                   69%
CPU max MHz:                          5083.3979
CPU min MHz:                          2200.0000
BogoMIPS:                             6802.30
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
L1d cache:                            512 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             8 MiB (16 instances)
L3 cache:                             64 MiB (2 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-31
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-triton==3.0.0+dedb7bdf33
[pip3] torch==2.4.0
[pip3] torchao==0.3.1
[pip3] torchaudio==2.4.0
[pip3] torchvideo==0.0.0
[pip3] triton==3.0.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] pytorch-triton            3.0.0+dedb7bdf33          pypi_0    pypi
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchao                   0.3.1                    pypi_0    pypi
[conda] torchaudio                2.4.0                    pypi_0    pypi
[conda] torchvideo                0.0.0                    pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi

Aug 02 '24 13:08 lhl

torchchat torchchat copied to clipboard

AOTI/DSO model does not run in Linux

🐛 Describe the bug

Versions

torchchat
torchchat copied to clipboard