xlstm RuntimeError: Error building extension 'slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0'

This problem shows after i solved "RuntimeError: Ninja is required to load C++ extensions" by "pip3 install Ninja"

Jun 19 '24 03:06 God-YYB

Would you please paste the log of your output? I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string. But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem. PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

Jun 19 '24 03:06 miaozhixu

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Jun 19 '24 09:06 Marco-Nguyen

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3.
I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. https://github.com/NX-AI/xlstm/issues/19#issuecomment-2162005087

Jun 19 '24 10:06 miaozhixu

So you installed xlstm via pip, not by cloning the repo, right?

Jun 19 '24 10:06 Marco-Nguyen

So you installed xlstm via pip, not by cloning the repo, right?

Yep

Jun 19 '24 13:06 miaozhixu

check your log file, there should be few Fatal error or says some file doesn't exist. I have the same error under Win 10 then I tried linux, still has the problem. I thought it might be some root or environment problem. ECHO your $PATH and $LD_LIBRARY_PATH see if you have cuda path, if not, following my step below. Remember to checked your cuda installation location, once you have find your cuda installation location then use """ export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH """ Remember to change the path with your own one This will solve the problem. This one works on me. The other solution "pip install cccl" doesn't works in my situation.

Also, under Readme author told us using "python experiments/main.py --config experiments/parity_xLSTM01.yaml" which I have got no such a file error, there is no upper case in the name of the yaml file

And you should use pip install xlstm=1.0.3, there are other bugs in 1.0,4

Jun 20 '24 07:06 yongyin-ma

check your log file, there should be few Fatal error or says some file doesn't exist. I have the same error under Win 10 then I tried linux, still has the problem. I thought it might be some root or environment problem. ECHO your $PATH and $LD_LIBRARY_PATH see if you have cuda path, if not, following my step below. Remember to checked your cuda installation location, once you have find your cuda installation location then use """ export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH """ Remember to change the path with your own one This will solve the problem. This one works on me. The other solution "pip install cccl" doesn't works in my situation.

Also, under Readme author told us using "python experiments/main.py --config experiments/parity_xLSTM01.yaml" which I have got no such a file error, there is no upper case in the name of the yaml file

And you should use pip install xlstm=1.0.3, there are other bugs in 1.0,4

The yaml file has a case error that is easy to resolve. I will try the PATH solution you mentioned. Thank you

Jun 21 '24 02:06 God-YYB

Would you please paste the log of your output? I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string. But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem. PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

Thank you，i will try pip install xlstm v1.0.3. later ，and the version details are very helpful!

Jun 21 '24 02:06 God-YYB

Hi, I am trying to install the same version as you, but when I install cudnn>9, it return error: torch 2.3.1+cu121 requires nvidia-cudnn-cu12==8.9.2.26, how did you do that?

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

Jun 21 '24 12:06 yanpeng0520

nvidia-cudnn-cu12==8.9.2.26

I follow the nvidia.com's guide to install cudnn and cuda, after successfully install these two libs, install the pytorch.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install cuda-toolkit-12-1 then add /usr/local/cuda/bin to PATH sudo apt-get install cudnn9-cuda-12 finally install pytorch: conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia I use anaconda 3, but I think pip will work just fine.

Jun 21 '24 15:06 miaozhixu

Would you please paste the log of your output? I face the same problem on windows 11, it's caused by the CUDA libraries, because Ninja assemble the nvcc command line incorrectly, leading the program cannot find CUDA libraries. I guess it was the space char in path string. But on ubuntu 22..04, with NVidia driver 535, CUDA 12.1, cudnn 9, pytorch 2.3.1, and Ninja installed, it has no problem. PS: the new version of xlstm 1.0.4 has something wrong in slstm layer src, you should try 1.0.3 on Ubuntu.

Thank you，i will try pip install xlstm v1.0.3. later ，and the version details are very helpful!

How do you all switch to the Ubuntu version? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you using a virtual machine:(

Jun 29 '24 03:06 Atlantis-esh

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment) How do you all switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

Jun 29 '24 03:06 Atlantis-esh

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error.

But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

Hello, May I ask you how do you switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

Jun 29 '24 03:06 Atlantis-esh

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Bring up a fresh Ubuntu 22.04.4 installation. It comes with a NVidia GPU driver 535. Follow the document on nivida.com to install CUDA 12.1 and cudnn 9. Install pytorch 2.3.1 with CUDA support. Use pip to install xlstm, I recommend you install xlstm v1.0.3. I try 1.0.4 yestoday, the return_last_state parameter lead to an error. But somebody say that this issue could solve with "conda install cccl". Check the link below. #19 (comment)

Hello, May I ask you how do you switch the Ubuntu version to 22.04? Isn't it normal to use a laboratory server? Isn't the Ubuntu system on one server fixed? Or are you all using a virtual machine:(

Ubuntu installed on my Laptop, along side with windows 11. This laptop has a RTX5000 GPU.

Jun 30 '24 08:06 miaozhixu

@miaozhixu I have been puzzled by this problem for a long time, may I ask if you have successfully run it? If convenient, can you add the contact information for communication

Jul 22 '24 03:07 leezhien

@miaozhixu I have been puzzled by this problem for a long time, may I ask if you have successfully run it? If convenient, can you add the contact information for communication

Not switch, but fresh new installation of Ubuntu

Jul 26 '24 12:07 miaozhixu

@miaozhixu Hi, may I ask about your setup steps? I am facing the above issues and could not find a solution up to now

Has it been resolved now？

Nov 13 '24 10:11 zhonglin-cdut

@miaozhixu Using C:\Users\L.J.Y\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file C:\Users\L.J.Y\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124\slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0\build.ninja... D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( Building extension module slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward.cuda.o.d -Xcudafe --diag_suppress=dl l_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=ba se_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler / wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_IN CLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\So ftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit \CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIO NS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compu te_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GAT ES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 - U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward.cu -o slstm_backward.cuda.o FAILED: slstm_backward.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward.cuda.o.d -Xcudafe --diag_suppress=dll_inte rface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_cla ss_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_ EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare \anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ - D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80, code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLS TM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 - DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUD A_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward.cu -o slstm_backward.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' [2/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward_cut.cuda.o.d -Xcudafe --diag_suppres s=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppres s=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompil er /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_AP I_INCLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID :\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Too lkit\CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVE RSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=c ompute_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STAT ES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0 .0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu -o slstm_backward_cut.cuda.o FAILED: slstm_backward_cut.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_backward_cut.cuda.o.d -Xcudafe --diag_suppress=dll_ interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd 4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCL UDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Soft Ware\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\C UDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS __ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute 80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 - DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES =4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_backward_cut.cu -o slstm_backward_cut.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' [3/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_pointwise.cuda.o.d -Xcudafe --diag_suppress=d ll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=b ase_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_I NCLUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\S oftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolki t\CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSI ONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=comp ute_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES= 4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GA TES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu -o slstm_pointwise.cuda.o FAILED: slstm_pointwise.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_pointwise.cuda.o.d -Xcudafe --diag_suppress=dll_int erface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_cl ass_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd426 7 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWar e\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA \v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80 ,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSL STM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CU DA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_pointwise.cu -o slstm_pointwise.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' [4/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output blas.cuda.o.d -Xcudafe --diag_suppress=dll_interfac e_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_h as_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xc ompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTE NSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\ana conda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12. 4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__C UDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code =compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_D TYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLS TM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\blas.cu -o blas.cuda.o FAILED: blas.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output blas.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conf lict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_dif ferent_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompile r /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_ H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\anaconda
envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\incl ude" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code=compu te_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B =float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIM PLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS_ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\blas.cu -o blas.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' [5/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_forward.cuda.o.d -Xcudafe --diag_suppress=dll interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=bas e_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /w d4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INC LUDE_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\Sof tWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit
CUDA\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSION S__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=comput e_80,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATE S=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_forward.cu -o slstm_forward.cuda.o FAILED: slstm_forward.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output slstm_forward.cuda.o.d -Xcudafe --diag_suppress=dll_inter face_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_clas s_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_E XTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare
anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v 12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,c ode=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLST M_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -D SLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda\slstm_forward.cu -o slstm_forward.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' [6/7] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output cuda_error.cuda.o.d -Xcudafe --diag_suppress=dll_in terface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_c lass_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd42 67 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUD E_EXTENSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWa re\anaconda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUD A\v12.4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_8 0,code=compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DS LSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__C UDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\cuda_error.cu -o cuda_error.cuda.o FAILED: cuda_error.cuda.o C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc --generate-dependencies-with-compile --dependency-output cuda_error.cuda.o.d -Xcudafe --diag_suppress=dll_interfac e_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_h as_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xc ompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTE NSION_H -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\SoftWare\ana conda\envs\xlstm\Lib\site-packages\torch\include\TH -ID:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12. 4\include" -ID:\SoftWare\anaconda\envs\xlstm\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__C UDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 -std=c++17 -Xptxas="-v" -gencode arch=compute_80,code =compute_80 -res-usage --use_fast_math -O3 "-Xptxas -O3" --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=64 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=1 -DSLSTM_NUM_STATES=4 -DSLSTM_D TYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLS TM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS_ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -c E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\util\cuda_error.cu -o cuda_error.cuda.o nvcc fatal : Unknown option '-Xptxas -O3' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 2104, in _run_ninja_build subprocess.run( File "D:\SoftWare\anaconda\envs\xlstm\Lib\subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "E:\Project\pycharm\xlstm-main\main.py", line 158, in main(cfg) File "E:\Project\pycharm\xlstm-main\main.py", line 54, in main model = xLSTMLMModel(from_dict(xLSTMLMModelConfig, OmegaConf.to_container(cfg.model))).to( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_lm_model.py", line 29, in init self.xlstm_block_stack = xLSTMBlockStack(config=config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_block_stack.py", line 84, in init self.blocks = self._create_blocks(config=config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\xlstm_block_stack.py", line 105, in _create_blocks blocks.append(sLSTMBlock(config=config)) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\block.py", line 33, in init super().init( File "E:\Project\pycharm\xlstm-main\xlstm\blocks\xlstm_block.py", line 63, in init self.xlstm = sLSTMLayer(config=self.config.slstm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\layer.py", line 78, in init self.slstm_cell = sLSTMCell(self.config) ^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 780, in new return sLSTMCell_cuda(config, skip_backend_init=skip_backend_init) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 690, in init self.func = sLSTMCellFuncGenerator(self.training, config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 536, in sLSTMCellFuncGenerator slstm_cuda = sLSTMCellCUDA.instance(config=config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\cell.py", line 515, in instance cls.mod[repr(config)] = load( ^^^^^ File "E:\Project\pycharm\xlstm-main\xlstm\blocks\slstm\src\cuda_init.py", line 84, in load mod = _load(name + suffix, sources, **myargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1314, in load return _jit_compile( ^^^^^^^^^^^^^ File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1721, in _jit_compile _write_ninja_file_and_build_library( File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 1833, in _write_ninja_file_and_build_library _run_ninja_build( File "D:\SoftWare\anaconda\envs\xlstm\Lib\site-packages\torch\utils\cpp_extension.py", line 2120, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'slstm_HS64BS8NH1NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' this is my problem,how to solve,please.I try it all one day

Dec 06 '24 10:12 2022LJC