modern-srwm icon indicating copy to clipboard operation
modern-srwm copied to clipboard

How to import self_ref_v0.cu?

Open Rogerspy opened this issue 3 years ago • 2 comments

I extract the self_ref_v0.cu and SRWMlayer from source code, and incorporate them into my own model. I get

ImportError: cannot import name 'self_ref' from 'xx.layers'

My package structure as follow:

layers
     __init__.py
     self_ref_v0.cu
     self_ref_layer.py
  • self_ref_v0.cu is the file extracted from this repo
  • self_ref_layer.py is the SRWMlayer package:
     from . import self_ref_v0
    
     class SWRMlayer(...):
         ...
    

Rogerspy avatar Jun 14 '22 03:06 Rogerspy

I extract the self_ref_v0.cu and SRWMlayer from source code, and incorporate them into my own model. I get

ImportError: cannot import name 'self_ref' from 'xx.layers'

My package structure as follow:

layers
     __init__.py
     self_ref_v0.cu
     self_ref_layer.py
* `self_ref_v0.cu` is the file extracted from this repo

* `self_ref_layer.py` is the `SRWMlayer` package:
  ```
   from . import self_ref_v0
  
   class SWRMlayer(...):
       ...
  ```

I solved this problem. But I get another error:

    ...
    out = out.reshape(bsz, slen, self.num_head * self.dim_head)
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  • pytorch == 1.11.0
  • bsz=20, slen=512, self.num_head * self.dim_head = 768

I tried different parameters (slen, num_head, dim_head), the same error always occurs.

Rogerspy avatar Jun 14 '22 03:06 Rogerspy

I'm not sure exactly which parameter is causing the error in your case. You should first try a much smaller setting to see if that works, and then try larger parameters to find the problematic one. If this still does not resolve the issue, I guess you should follow the error message, i.e., run the code with "CUDA_LAUNCH_BLOCKING=1"...

One thing to be noted is that in the current implementation, the head dimension (dim_head) can not be too big (due to the shared memory limit). To train large models, the number of heads (num_head) has to be increased while keeping dim_head small. The rule of thumb I use (on 2080, P100, and V100 GPUs) is to keep dim_head < 64. For example, to get the total hidden layer size of 768, I'd set dim_head = 48 and n_head = 16. This works for me on a 2080 while (dim_head = 96, n_head = 8) doesn't. This limit depends on the type of GPU you are using.

kazuki-irie avatar Jun 14 '22 10:06 kazuki-irie