Could not run 'xformers::efficient_attention_forward_generic' with arguments from the 'CUDA' backend - USE_MEMORY_EFFICIENT_ATTENTION over gunicorn

Open archywillhe opened this issue 2 years ago • 3 comments

❓ Questions and Help

When I tried to turned on USE_MEMORY_EFFICIENT_ATTENTION=1 with gunicorn I got this:

But it works fine with USE_MEMORY_EFFICIENT_ATTENTION=1 python script.py

NotImplementedError: Could not run 'xformers::efficient_attention_forward_generic' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_generic' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

Oct 05 '22 09:10 archywillhe

Hi @archywillhe and thanks for opening this issue. It looks like there is an issue with your xformer/pytorch setup. You mention gunicorn - have you tried running your code outside it to isolate the problem?

Oct 05 '22 15:10 danthe3rd

@danthe3rd

yes, outside of gunicorn it is working well; I only encountered this running inside gunicorn (maybe need to do some config?)

Oct 06 '22 07:10 archywillhe

Is it possible to share a reproducible example? Like a minimal version of script.py if you can

Oct 06 '22 07:10 danthe3rd

xformers xformers copied to clipboard

Could not run 'xformers::efficient_attention_forward_generic' with arguments from the 'CUDA' backend - USE_MEMORY_EFFICIENT_ATTENTION over gunicorn

❓ Questions and Help

xformers
xformers copied to clipboard