compute-runtime
compute-runtime copied to clipboard
Cannot Allocate More than 8 GB on A770 16GB
I have a A770 16 GB and I installed intel-compute-runtime 22.43.24595.30 and intel extensions for pytorch v1.13.10+xpu on Linux Kernel 6.2rc8
When you try to allocate more than 8 GB.
import torch
import torchvision
import intel_extension_for_pytorch as ipex
size = 46000
w = torch.rand(size, size, dtype=torch.bfloat16, device='xpu')
x = torch.rand(size, size, dtype=torch.bfloat16, device='xpu')
The python script crashes.
RuntimeError: Native API failed. Native API returns: -997 (The plugin has emitted a backend specific error) -997 (The plugin has emitted a backend specific error)
https://github.com/intel/intel-extension-for-pytorch/issues/296
However, allocating less than 8 GB, works perfectly. Why is there a memory limit on A770 16 GB?
@BA8F0D39 Let me confirm are you using WSL?
If yes, the possible cause is Linux subsystem might follow Windows operating system limitation. The limitation would be around half of memory in single process, ~8GB on your 16GB card. I am confirming it with our driver guys. Thanks.
@arthuryuan1987 I am on Arch Linux Kernel 6.2rc8 with A770 16 GB. Not on Windows. Not on WSL
Allocating less than 8GB on A770 16GB works. Allocating more than 8GB on A770 16GB does not work
@arthuryuan1987 I am on Arch Linux Kernel 6.2rc8 with A770 16 GB. Not on Windows. Not on WSL
Allocating less than 8GB on A770 16GB works. Allocating more than 8GB on A770 16GB does not work
Thanks, I will file an internal issue and talk with our driver guys. Keep you updated.
I'm having the same issue. Torch is trying to load a tensor on a 16GB card and I get
RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY)
I patched the transformers code to report the tensor size and it's not hitting the 4GB limit described in the other issue. This is the state of the load, it's not able to load more than 8GB
ValueError: Tried to load tensor of size: 67108864 Memory allocated: 8057520640 Memory reserved: 8126464000