mamba RuntimeError: CUDA error: no kernel image is available for execution on the device

is a p5200 enough for this?

] Traceback (most recent call last): File "/home/user/mamba/simplermambassm.py", line 259, in losses = estimate_loss() File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/user/mamba/simplermambassm.py", line 94, in estimate_loss logits, loss = model(X,Y) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/user/mamba/simplermambassm.py", line 210, in forward x = self.blocks(x) # (B,T,C_e) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/user/mamba/simplermambassm.py", line 188, in forward x = x + self.sa_head(self.ln1(x)) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 149, in forward out = mamba_inner_fn( File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 306, in mamba_inner_fn return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd return fwd(*args, **kwargs) File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 181, in forward conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, True) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Dec 09 '23 01:12 thistleknot

@tridao (I am not sure if this is just a hack, but for us old guys with CCC < 7, can we do this?)

I see that the Quadro P5200 has Cuda Compute capability 6.1. I saw the same error with my GeForce GTX 1070 (which is also Compute Capability 6.1)

I was able to fix it by compiling the causal-conv1d dependency from source, as follows:

git clone https://github.com/Dao-AILab/causal-conv1d.git
# this is the latest version that Mamba supports:
git checkout v1.0.2
cd causal-conv
# edit setup.py to add the lines here:
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_60,code=sm_60")

Here is where you need to add those lines.

Then, compile it from source with:

CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

You can use the following script to test whether it is working properly:

import torch
from causal_conv1d import causal_conv1d_fn

batch, dim, seq, width = 10, 5, 17, 4
x = torch.zeros((batch, dim, seq)).to('cuda')
weight = torch.zeros((dim, width)).to('cuda')
bias = torch.zeros((dim, )).to('cuda')

causal_conv1d_fn(x, weight, bias, None)

EDIT: Just realized the Mamba repo also assumes CCC >= 7. So, I did a similar edit to the mamba setup.py and compiled it with:

henry@henry-gs65:mamba$ MAMBA_FORCE_BUILD=TRUE pip install .

(This takes about 10 minutes to compile)

Once doing this, the top-level Mamba demo works:

import torch

from mamba_ssm import Mamba

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape

Dec 10 '23 21:12 hrbigelow

close

>>> y = model(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 149, in forward
    out = mamba_inner_fn(
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 306, in mamba_inner_fn
    return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 113, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 181, in forward
    conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(x, conv1d_weight, conv1d_bias, True)
TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor

Invoked with: tensor([[[-0.4806,  1.2685,  0.3929,  ...,  0.3327,  0.3938, -0.5350],
         [ 0.9421, -0.1715, -0.0481,  ..., -0.1955, -0.8604, -0.4096],
         [ 0.5454, -0.1034, -0.2881,  ...,  0.2157, -1.2089, -0.3394],
         ...,
         [ 0.3014,  0.2976, -0.3656,  ..., -0.4423, -0.8560, -0.3013],
         [-0.3690, -0.3119, -0.1994,  ..., -0.4742, -0.6223,  0.2423],
         [-0.7320,  1.4818,  0.6340,  ..., -0.4294,  0.2926, -0.0436]],

        [[ 0.4325, -0.4794,  0.4466,  ...,  0.1774,  0.8001, -0.0083],
         [-0.2831, -0.2780,  0.3027,  ...,  0.3467, -1.0696,  0.2190],
         [-0.7058,  0.7942, -0.5447,  ...,  0.5141, -0.9554, -0.0649],
         ...,
         [-0.7701,  0.9309, -0.6030,  ...,  0.2993, -0.0422, -0.1484],
         [ 0.5808,  0.4285, -0.5568,  ...,  1.3064, -1.0199, -0.3363],
         [ 0.0734,  0.0993,  0.6768,  ..., -0.1356,  0.9295, -0.1664]]],
       device='cuda:0', requires_grad=True), tensor([[-0.0555,  0.4169,  0.2594, -0.4943],
        [-0.0554,  0.0376,  0.1702,  0.4476],
        [-0.1875,  0.4470,  0.2299, -0.0788],
        [-0.2496,  0.4405, -0.0241,  0.0307],
        [ 0.2666, -0.2731, -0.1284, -0.3504],
        [ 0.2001,  0.1497,  0.2172,  0.1289],
        [ 0.3474,  0.3953,  0.2375,  0.0597],
        [ 0.0498,  0.1374, -0.0508, -0.1526],
        [-0.2388, -0.2890, -0.4515,  0.0008],
        [-0.2706, -0.4276, -0.4668,  0.4245],
        [ 0.0252,  0.0295, -0.4991,  0.2078],
        [ 0.2212,  0.3381, -0.3815,  0.1831],
        [-0.3029, -0.3729, -0.1333, -0.1371],
        [-0.3745,  0.0316, -0.1675,  0.0064],
        [ 0.4358,  0.4920, -0.4541, -0.0722],
        [ 0.2807, -0.1016, -0.4563, -0.3044],
        [ 0.1035,  0.0162,  0.4479,  0.3260],
        [-0.2877,  0.1106,  0.4981,  0.4084],
        [-0.3320, -0.3829, -0.1360,  0.3744],
        [-0.3771, -0.3639, -0.1163,  0.3709],
        [-0.2274, -0.4964, -0.0816,  0.4454],
        [ 0.1764, -0.0485,  0.3448, -0.4393],
        [-0.3905, -0.3605,  0.0623, -0.2038],
        [-0.2044, -0.1454, -0.1526, -0.4165],
        [-0.0414,  0.1940,  0.3441, -0.3418],
        [ 0.4200, -0.2309,  0.1998, -0.1196],
        [-0.4553,  0.1990,  0.4579,  0.1669],
        [-0.3292,  0.0408, -0.4167,  0.3332],
        [ 0.4237,  0.4848, -0.3006, -0.2292],
        [ 0.4939,  0.1801, -0.1294,  0.0011],
        [ 0.3516, -0.3912,  0.3251,  0.3016],
        [-0.0648, -0.0567, -0.3247,  0.4323]], device='cuda:0',
       requires_grad=True), Parameter containing:
tensor([-3.1444e-01,  4.3207e-02,  2.2112e-01, -3.4120e-01,  4.0195e-01,
        -1.4227e-01, -4.5976e-01, -3.6258e-04, -4.6205e-01,  1.7177e-01,
         4.6020e-01, -1.7618e-01,  2.0168e-01,  1.2738e-01,  2.8975e-01,
        -4.2130e-01, -2.3378e-01, -1.8998e-01, -9.5853e-02, -2.4321e-01,
        -1.0333e-02, -2.0879e-01,  1.2288e-01,  5.1831e-02, -4.9842e-02,
        -3.1233e-01,  1.4064e-01, -2.4546e-01,  3.0703e-01,  1.4846e-02,
         7.5587e-02, -3.6691e-01], device='cuda:0', requires_grad=True), True
>>>

Dec 12 '23 01:12 thistleknot

oops, I did it out of order: nm, still produced the same error after applying same process to mamba's setup.py

Fyi for us newbs

CCC stands for "CUDA Compute Capability," a numerical value that represents the features supported by a CUDA (Compute Unified Device Architecture) hardware (typically a GPU). CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing (an approach known as GPGPU, General-Purpose computing on Graphics Processing Units).

The Compute Capability is a version number indicating the features supported by the GPU. Different versions of CUDA GPUs support different features and therefore have different Compute Capabilities. For example, the Quadro P5200 and GeForce GTX 1070 GPUs mentioned have a Compute Capability of 6.1. This version number is important for developers because they need to compile their programs for a specific Compute Capability to ensure compatibility and optimal performance on the target GPU.

When you modify a setup.py file of a Python package to include specific Compute Capability flags, you are instructing the compiler to generate code optimized for GPUs with that particular Compute Capability. This is often necessary when working with older GPUs or when the pre-compiled binaries of a library do not support the specific Compute Capability of your GPU.

Dec 12 '23 01:12 thistleknot

btw, I had to do something similar to get ctransformers to work

Dec 12 '23 01:12 thistleknot

oops, sorry but I forgot a crucial thing. Mamba states that it requires causal_conv1d version <= 1.0.2. I forgot to mention this. So, you need to do a git checkout v1.0.2 before you do the pip install. From where you are now, I'd say it would be:

$ cd causal-conv1d
$ git checkout v1.0.2
# you've already edited the setup.py file I assume
$ pip uninstall causal-conv1d 
$ CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

At this point, it may work ;) Since Mamba dynamically loads the causal-conv1d python module, no re-compilation of mamba is necessary. But I am not positive of that.

Dec 12 '23 03:12 hrbigelow

(i edited the original instruction to reflect this just now)

Dec 12 '23 03:12 hrbigelow

Sorry I'm traveling this week but will have time to look into this next week.

Dec 12 '23 04:12 tridao

Processing /home/user/mamba/causal-conv1d
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/user/lit-gpt/env/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 480, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-w4x0ekut/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 9, in <module>
      ModuleNotFoundError: No module named 'packaging'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a

despite installing python3-packaging and pip install packaging (and can confirm I can import packaging)

Dec 12 '23 05:12 thistleknot

nm

pip install wheel
python setup.py

Dec 12 '23 05:12 thistleknot

yay, that did it back in the game =D

Dec 12 '23 05:12 thistleknot

@tridao (I am not sure if this is just a hack, but for us old guys with CCC < 7, can we do this?)

I see that the Quadro P5200 has Cuda Compute capability 6.1. I saw the same error with my GeForce GTX 1070 (which is also Compute Capability 6.1)

I was able to fix it by compiling the causal-conv1d dependency from source, as follows:
git clone https://github.com/Dao-AILab/causal-conv1d.git
# this is the latest version that Mamba supports:
git checkout v1.0.2
cd causal-conv
# edit setup.py to add the lines here:
    cc_flag.append("-gencode")
    cc_flag.append("arch=compute_60,code=sm_60")
Here is where you need to add those lines.

Then, compile it from source with:

CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

You can use the following script to test whether it is working properly:
import torch
from causal_conv1d import causal_conv1d_fn

batch, dim, seq, width = 10, 5, 17, 4
x = torch.zeros((batch, dim, seq)).to('cuda')
weight = torch.zeros((dim, width)).to('cuda')
bias = torch.zeros((dim, )).to('cuda')

causal_conv1d_fn(x, weight, bias, None)
EDIT: Just realized the Mamba repo also assumes CCC >= 7. So, I did a similar edit to the mamba setup.py and compiled it with:

henry@henry-gs65:mamba$ MAMBA_FORCE_BUILD=TRUE pip install .

(This takes about 10 minutes to compile)

Once doing this, the top-level Mamba demo works:
import torch

from mamba_ssm import Mamba

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=16,  # SSM state expansion factor
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape

Oh, God, I solve it! Love from P40 (CCC 6.1)!!!

Jul 04 '24 00:07 StorywithLove

this solution no longer works for the latest mamba build

(mamba-venv) [root@pve-m7330 mamba]# python Python 3.10.9 (main, Mar 8 2023, 10:47:38) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch d_state=16, # SSM state expansion factor d_conv=4, # Local convolution width expand=2, # Block expansion factor ).to("cuda") y = model(x) assert y.shape == x.shape from mamba_ssm import Mamba /home/user/mamba/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, /home/user/mamba/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. def backward(ctx, dout): /home/user/mamba/mamba_ssm/ops/triton/layer_norm.py:986: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. def forward( /home/user/mamba/mamba_ssm/ops/triton/layer_norm.py:1045: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. def backward(ctx, dout, *args): /home/user/mamba/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True): /home/user/mamba/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. def backward(ctx, grad_output): /home/user/mamba/mamba_ssm/ops/triton/ssd_combined.py:758: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float("inf")), return_final_states=False, activation="silu", /home/user/mamba/mamba_ssm/ops/triton/ssd_combined.py:836: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. def backward(ctx, dout, *args):

batch, length, dim = 2, 64, 16 x = torch.randn(batch, length, dim).to("cuda") model = Mamba( ... # This module uses roughly 3 * expand * d_model^2 parameters ... d_model=dim, # Model dimension d_model ... d_state=16, # SSM state expansion factor ... d_conv=4, # Local convolution width ... expand=2, # Block expansion factor ... ).to("cuda") y = model(x) Traceback (most recent call last): File "", line 1, in File "/home/user/mamba-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/mamba-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/home/user/mamba/mamba_ssm/modules/mamba_simple.py", line 146, in forward out = mamba_inner_fn( File "/home/user/mamba/mamba_ssm/ops/selective_scan_interface.py", line 317, in mamba_inner_fn return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight, File "/home/user/mamba-venv/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/user/mamba-venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 455, in decorate_fwd return fwd(*args, **kwargs) File "/home/user/mamba/mamba_ssm/ops/selective_scan_interface.py", line 187, in forward conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd( TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported: 1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: bool) -> torch.Tensor

Invoked with: tensor([[[ 0.0585, 0.7326, 1.0051, ..., 0.5815, 0.5123, 0.3603], [-1.1954, -0.1859, -0.5755, ..., 0.1994, -0.8212, -0.0045], [-0.8660, -0.0906, 0.4311, ..., 0.3252, -1.0255, 0.2714], ..., [-0.2807, -0.6320, -0.0296, ..., 0.0924, -0.1205, 0.0907], [ 0.7265, -0.5012, 0.8706, ..., -0.2371, 0.5663, 0.0296], [-0.7008, 0.5792, -0.0274, ..., 0.0996, -0.2764, -0.2743]],

    [[-0.1144,  0.0690,  0.9025,  ..., -0.4769, -0.3929, -0.0055],
     [ 0.6885,  0.4210, -0.0834,  ..., -0.0957,  0.4545, -1.1244],
     [ 0.6075,  0.0818,  0.1200,  ..., -0.2688, -0.4929,  0.0289],
     ...,
     [-0.6323, -0.2017, -0.3172,  ..., -0.2214,  0.8582,  1.2913],
     [ 0.3027, -0.1702,  0.2258,  ..., -0.6353,  0.1729, -0.2953],
     [ 1.0242,  0.3739, -0.4389,  ...,  0.5205,  0.1748, -0.4015]]],
   device='cuda:0', requires_grad=True), tensor([[ 0.0987,  0.1059,  0.1087,  0.2217],
    [-0.3802, -0.0909, -0.4198, -0.1257],
    [ 0.1083, -0.0419,  0.4580,  0.4761],
    [ 0.1094, -0.0357, -0.0371,  0.0531],
    [-0.3875,  0.1226, -0.1692,  0.1329],
    [-0.1424,  0.2907, -0.4435, -0.4014],
    [-0.2112,  0.4587,  0.0418,  0.4757],
    [ 0.1140,  0.3250,  0.3956, -0.4221],
    [-0.1109,  0.4271,  0.1018,  0.1395],
    [ 0.4083,  0.1258, -0.3790,  0.0996],
    [-0.0062, -0.2871,  0.3098,  0.3148],
    [-0.0467, -0.2842,  0.3562, -0.0613],
    [ 0.1013, -0.2330,  0.2027,  0.2846],
    [ 0.0039, -0.2095, -0.4826,  0.2009],
    [-0.2955, -0.1617, -0.0491,  0.3483],
    [-0.4664,  0.0722,  0.1840,  0.3535],
    [-0.0564,  0.2365, -0.3335,  0.1983],
    [-0.1127,  0.0549, -0.1763,  0.1116],
    [ 0.2882,  0.4756,  0.3223,  0.2688],
    [-0.2654,  0.0236,  0.3968,  0.2946],
    [-0.0341,  0.0547,  0.1876,  0.0800],
    [-0.2642, -0.2790,  0.3583,  0.2026],
    [-0.4669,  0.3040, -0.1916, -0.4390],
    [ 0.3570, -0.4490, -0.3143, -0.3155],
    [ 0.2841, -0.4582, -0.1350, -0.4604],
    [ 0.0170, -0.0625,  0.0056, -0.1275],
    [-0.3403,  0.2159, -0.1715, -0.2652],
    [-0.2765, -0.3144, -0.2965,  0.1824],
    [-0.0823, -0.2959, -0.2007,  0.1748],
    [ 0.3444,  0.4872, -0.4085,  0.1206],
    [-0.1521,  0.4988, -0.1553,  0.3104],
    [ 0.2039,  0.4049,  0.2656,  0.3132]], device='cuda:0',
   requires_grad=True), Parameter containing:

tensor([-0.4841, -0.3013, 0.4225, -0.3196, 0.1289, -0.2113, 0.2934, 0.0558, -0.3115, 0.4327, 0.1833, -0.3552, -0.3535, -0.3619, -0.2438, 0.3835, -0.0902, -0.0893, 0.1190, 0.1235, 0.3639, 0.2415, 0.0895, 0.0057, -0.1587, 0.4039, -0.0957, -0.0197, -0.4331, -0.4305, -0.3638, -0.2179], device='cuda:0', requires_grad=True), None, None, None, True

assert y.shape == x.shape Traceback (most recent call last): File "", line 1, in NameError: name 'y' is not defined

I'm going to try a newer causal-conv1d than 1.0.2, but these instructions used to work... why not include compute 6.0 in the source rather than have us patch it in? Compute 5.3 is in there.

Sep 07 '24 01:09 thistleknot

disregard. Works when I compile with causal-conv1d v1.4.0 with compute_60 patched into setup.py

although, this makes a good argument why not include this by default?

Sep 07 '24 01:09 thistleknot

unfortunately

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

when trying to load import torch

I would think if this patch was better integrated, this wouldn't happen

Sep 07 '24 02:09 thistleknot

nevermind, I fixed that by pip install numpy==1.*

Sep 07 '24 02:09 thistleknot