swift icon indicating copy to clipboard operation
swift copied to clipboard

Eager tensors always report being on CPU Device despite documentation

Open garymm opened this issue 3 years ago • 5 comments

I'm playing with https://www.tensorflow.org/swift/tutorials/introducing_x10. Both locally and on Colab, the eager tensor shows up on the CPU. The text says If you are running this notebook on a GPU-enabled instance, you should see that hardware reflected in the device description above.

Even if I try to force it to the GPU, it seems to stay on the CPU:

let eagerGPU = Device(kind: .GPU, ordinal: 0, backend: .TF_EAGER)
let eagerTensor1 = Tensor([0.0, 1.0, 2.0], on: eagerGPU)
let eagerTensor2 = Tensor([1.5, 2.5, 3.5], on: eagerGPU)
let eagerTensorSum = eagerTensor1 + eagerTensor2
eagerTensor1.device

Output:

▿ Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
  - kind : TensorFlow.Device.Kind.CPU
  - ordinal : 0
  - backend : TensorFlow.Device.Backend.TF_EAGER

So I'd say there may be 2 bugs here:

  1. Either the documentation is wrong and eager tensors are only supposed to be able to use the CPU, or the documentation is right and code is buggy and doesn't use the GPU, and
  2. If the documentation is wrong, creating a tensor with an eager GPU should fail rather than silently run on the CPU.

garymm avatar Aug 26 '20 04:08 garymm

I believe this is due to a bug in the way that eager tensors report their device location. The eager tensors have their operations dispatched on the default accelerator, but always report themselves as being located on the CPU. If you run operations using them on your local machine, you can verify that they're running on the GPU by monitoring GPU activity via nvidia-smi or similar tools.

Likewise, eager tensors currently ignore the device you specify for them, so if you tell them to run on the CPU when there's a GPU available, they'll still run on the GPU.

X10 tensors are accurate in reporting which device they're attached to, as well as respecting manual device placement, just not eager tensors.

BradLarson avatar Aug 26 '20 14:08 BradLarson

It looks like this line is always returning the CPU device. I'll figure out how to surface the actual device being used.

texasmichelle avatar Sep 04 '20 00:09 texasmichelle

Once swift-apis#1156 is merged, TFE_TensorHandleDeviceType and TFE_TensorHandleDeviceID will be available, making this a straightforward fix.

texasmichelle avatar Dec 22 '20 22:12 texasmichelle

Tentative changes here.

texasmichelle avatar Dec 23 '20 00:12 texasmichelle

I ran into a problem adding eager/c_api_experimental.h since it contains C++ syntax in the initialization of the TFE_CustomDevice struct.

/home/michellecasbon/repos/out/libtensorflow-prefix/src/libtensorflow/tensorflow/c/eager/c_api_experimental.h:446:14: error: expected ';' at end of declaration list
  int version = TFE_CUSTOM_DEVICE_VERSION;
             ^

It's unclear how to get around this without pursuing custom import rules or modifying upstream.

texasmichelle avatar Dec 23 '20 02:12 texasmichelle