xformers
xformers copied to clipboard
Could not run 'xformers::efficient_attention_forward_generic' with arguments from the 'CUDA' backend - USE_MEMORY_EFFICIENT_ATTENTION over gunicorn
❓ Questions and Help
When I tried to turned on USE_MEMORY_EFFICIENT_ATTENTION=1 with gunicorn I got this:
But it works fine with USE_MEMORY_EFFICIENT_ATTENTION=1 python script.py
NotImplementedError: Could not run 'xformers::efficient_attention_forward_generic' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_generic' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].
Hi @archywillhe and thanks for opening this issue. It looks like there is an issue with your xformer/pytorch setup. You mention gunicorn - have you tried running your code outside it to isolate the problem?
@danthe3rd
yes, outside of gunicorn it is working well; I only encountered this running inside gunicorn (maybe need to do some config?)
Is it possible to share a reproducible example? Like a minimal version of script.py
if you can