modern-srwm
modern-srwm copied to clipboard
How to import self_ref_v0.cu?
I extract the self_ref_v0.cu and SRWMlayer from source code, and incorporate them into my own model. I get
ImportError: cannot import name 'self_ref' from 'xx.layers'
My package structure as follow:
layers
__init__.py
self_ref_v0.cu
self_ref_layer.py
self_ref_v0.cuis the file extracted from this repoself_ref_layer.pyis theSRWMlayerpackage:from . import self_ref_v0 class SWRMlayer(...): ...
I extract the
self_ref_v0.cuandSRWMlayerfrom source code, and incorporate them into my own model. I getImportError: cannot import name 'self_ref' from 'xx.layers'My package structure as follow:
layers __init__.py self_ref_v0.cu self_ref_layer.py* `self_ref_v0.cu` is the file extracted from this repo * `self_ref_layer.py` is the `SRWMlayer` package: ``` from . import self_ref_v0 class SWRMlayer(...): ... ```
I solved this problem. But I get another error:
...
out = out.reshape(bsz, slen, self.num_head * self.dim_head)
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
- pytorch == 1.11.0
- bsz=20, slen=512, self.num_head * self.dim_head = 768
I tried different parameters (slen, num_head, dim_head), the same error always occurs.
I'm not sure exactly which parameter is causing the error in your case. You should first try a much smaller setting to see if that works, and then try larger parameters to find the problematic one. If this still does not resolve the issue, I guess you should follow the error message, i.e., run the code with "CUDA_LAUNCH_BLOCKING=1"...
One thing to be noted is that in the current implementation, the head dimension (dim_head) can not be too big (due to the shared memory limit). To train large models, the number of heads (num_head) has to be increased while keeping dim_head small. The rule of thumb I use (on 2080, P100, and V100 GPUs) is to keep dim_head < 64. For example, to get the total hidden layer size of 768, I'd set dim_head = 48 and n_head = 16. This works for me on a 2080 while (dim_head = 96, n_head = 8) doesn't. This limit depends on the type of GPU you are using.