swift-apis icon indicating copy to clipboard operation
swift-apis copied to clipboard

Expose all devices.

Open texasmichelle opened this issue 3 years ago • 1 comments

On a machine with GPU or TPU, I get a segfault if I try to use Device with CPU type on XLA backend, e.g.:

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let t1 = Tensor([1, 1, 0], on: device)
let t2 = Tensor([1, 1, 0], on: device)
t1 + t2
2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end() 
*** Begin stack trace ***
	
	
	
	
	copyTensor
	
	
	
	
	$sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
	$s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
	$s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC
	
*** End stack trace ***
No such device: CPU:0
Current stack trace:
	frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2

A workaround is to set the XRT_DEVICE_MAP environment variable, but all device and backend combinations should be accessible without this.

See swift-models/#654.

texasmichelle avatar Aug 10 '20 23:08 texasmichelle

As examples of how these mappings are defined at the command line, here's how you would expose both the CPU and GPU as selectable devices (assuming a single CPU and GPU):

export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0|GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0'

and here's how you would expose two GPUs (not exposing the CPU):

export XRT_DEVICE_MAP='GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:0/device:XLA_GPU:1'

Currently, only one default device is found and exposed. If you want something other than the default, you need to manually specify the XLA -> S4TF mapping for all devices you want. The devices are parsed from the XRT_DEVICE_MAP environment variable within ParseEnvDevices here. That may be the place to add CPU support on GPU-default systems, because we can safely assume the CPU is present there.

BradLarson avatar Aug 19 '20 17:08 BradLarson