compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Cannot Allocate More than 8 GB on A770 16GB

Open BA8F0D39 opened this issue 2 years ago • 4 comments

I have a A770 16 GB and I installed intel-compute-runtime 22.43.24595.30 and intel extensions for pytorch v1.13.10+xpu on Linux Kernel 6.2rc8

When you try to allocate more than 8 GB.

import torch
import torchvision

import intel_extension_for_pytorch as ipex

size = 46000
w = torch.rand(size, size, dtype=torch.bfloat16, device='xpu')
x = torch.rand(size, size, dtype=torch.bfloat16, device='xpu')

The python script crashes.

RuntimeError: Native API failed. Native API returns: -997 (The plugin has emitted a backend specific error) -997 (The plugin has emitted a backend specific error)

https://github.com/intel/intel-extension-for-pytorch/issues/296

However, allocating less than 8 GB, works perfectly. Why is there a memory limit on A770 16 GB?

BA8F0D39 avatar Feb 19 '23 05:02 BA8F0D39

@BA8F0D39 Let me confirm are you using WSL?

If yes, the possible cause is Linux subsystem might follow Windows operating system limitation. The limitation would be around half of memory in single process, ~8GB on your 16GB card. I am confirming it with our driver guys. Thanks.

fengyuan14 avatar Feb 20 '23 06:02 fengyuan14

@arthuryuan1987 I am on Arch Linux Kernel 6.2rc8 with A770 16 GB. Not on Windows. Not on WSL

Allocating less than 8GB on A770 16GB works. Allocating more than 8GB on A770 16GB does not work

BA8F0D39 avatar Feb 20 '23 20:02 BA8F0D39

@arthuryuan1987 I am on Arch Linux Kernel 6.2rc8 with A770 16 GB. Not on Windows. Not on WSL

Allocating less than 8GB on A770 16GB works. Allocating more than 8GB on A770 16GB does not work

Thanks, I will file an internal issue and talk with our driver guys. Keep you updated.

fengyuan14 avatar Feb 22 '23 02:02 fengyuan14

I'm having the same issue. Torch is trying to load a tensor on a 16GB card and I get

RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY)

I patched the transformers code to report the tensor size and it's not hitting the 4GB limit described in the other issue. This is the state of the load, it's not able to load more than 8GB

ValueError: Tried to load tensor of size: 67108864 Memory allocated: 8057520640 Memory reserved: 8126464000

nathanodle avatar Feb 17 '24 23:02 nathanodle