xla [PJRT:GPU] Add setting for mocked number of hosts per slice

[PJRT:GPU] Add setting for mocked number of hosts per slice

Open jaro-sevcik opened this issue 1 year ago • 7 comments

With the existing enable_mock_nccl setting it is impossible to warm up compilation cache when there are multiple processes per node. This is because the cache key includes topology and GPU topology contains information about number of slices and number of hosts per slice. The current mocking of topologies always sets num_hosts_per_slice to 1. However, if you have multiple GPUs on a node and run a process-per-GPU then num_hosts_per_slice must be set to the number of GPUs.

This patch allows setting num_hosts_per_slice explicitly when creating the GPU client.