dpctl
dpctl copied to clipboard
ONEAPI_DEVICE_SELECTOR not working under sub-process
My environment:
export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ sycl
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:3] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:4] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:5] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
And I have following codes in file test_multi_process.py
:
import os
from multiprocessing import Pool
import time
import dpctl
env_var = "ONEAPI_DEVICE_SELECTOR"
backend = "level_zero"
device_type = "gpu"
def get_dev():
dev_ids = []
for dev in dpctl.get_devices(backend=backend, device_type=device_type):
# device filter_string with format: "backend:device_type:relative_id"
# dev_ids.append(int(dev.filter_string.split(":")[-1]))
dev_ids.append(dev.filter_string)
return dev_ids
def set_env_var(dev_id):
env_val = f"{backend}:{dev_id}"
print(f"[func] [{os.getpid()}] set {env_var} = {env_val}")
os.environ[env_var] = env_val
def func(x):
print(f"[func] [{os.getpid()}] >>>>>>>>>>>>>>>>>>>>>>")
set_env_var(x)
env_val = os.environ.get(env_var, None)
print(f"[func] [{os.getpid()}] x = {x}")
print(f"[func] [{os.getpid()}] {env_var} = {env_val}")
dev_ids = get_dev()
print(f"[func] [{os.getpid()}] dev_ids = {dev_ids}")
print(f"[func] [{os.getpid()}] <<<<<<<<<<<<<<<<<<<<<<\n")
def main():
env_val = os.environ.get(env_var, None)
print(f"[main] [{os.getpid()}] {env_var} = {env_val}")
dev_ids = get_dev()
print(f"[main] [{os.getpid()}] dev_ids = {dev_ids}")
with Pool(5) as p:
p.map(func, [1, 2, 3, 4, 5])
if __name__ == '__main__':
main()
Running test_multi_process.py
python test_multi_process.py
With following logs:
[main] [122421] ONEAPI_DEVICE_SELECTOR = level_zero:gpu
[main] [122421] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
# sub process 1
[func] [122440] >>>>>>>>>>>>>>>>>>>>>>
[func] [122440] set ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] x = 1
[func] [122440] ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122440] <<<<<<<<<<<<<<<<<<<<<<
# sub process 2
[func] [122441] >>>>>>>>>>>>>>>>>>>>>>
[func] [122441] set ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] x = 2
[func] [122441] ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122441] <<<<<<<<<<<<<<<<<<<<<<
# sub process 3
[func] [122442] >>>>>>>>>>>>>>>>>>>>>>
[func] [122442] set ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] x = 3
[func] [122442] ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122442] <<<<<<<<<<<<<<<<<<<<<<
# sub process 4
[func] [122443] >>>>>>>>>>>>>>>>>>>>>>
[func] [122443] set ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] x = 4
[func] [122443] ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122443] <<<<<<<<<<<<<<<<<<<<<<
# sub process 5
[func] [122444] >>>>>>>>>>>>>>>>>>>>>>
[func] [122444] set ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] x = 5
[func] [122444] ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122444] <<<<<<<<<<<<<<<<<<<<<<
In the main process, I want to use GPU ID [1,2,3,4,5]
, and create 5 process to use one of the GPUs in each process.
My question is that:
I have set the environment variable ONEAPI_DEVICE_SELECTOR
in each sub-process with using only one GPU, but the sub-process could still can see the 6 GPUs.
Is that need reload the dpcpp in python codes?
Or that ONEAPI_DEVICE_SELECTOR
with dpctl
can't be used in nested case?
Or that ONEAPI_DEVICE_SELECTOR
should work with command syc-ls
?
Please be aware that ONEAPI_DEVICE_SELECTOR
string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.
The ONEAPI_DEVICE_SELECTOR
requires you to specify the backend and the device type, so I'd expect
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:
$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls
Please be aware that
ONEAPI_DEVICE_SELECTOR
string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.The
ONEAPI_DEVICE_SELECTOR
requires you to specify the backend and the device type, so I'd expect$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:
$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls
Maybe you don't have test the usage of ONEAPI_DEVICE_SELECTOR
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:0.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.
SYCL Exception encountered: Error parsing selector string "level_zero:gpu:0" Too many colons (:)
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:1,3 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:1,3.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.
SYCL Exception encountered: Error parsing selector string "level_zero:gpu:1,3" Too many colons (:)
Please be aware that
ONEAPI_DEVICE_SELECTOR
string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.The
ONEAPI_DEVICE_SELECTOR
requires you to specify the backend and the device type, so I'd expect$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:
$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls
Do you ever use the environment variable ONEAPI_DEVICE_SELECTOR
to filter or use specific device?
@oleksandr-pavlyk I faced a similar situation, when modifying ONEAPI_DEVICE_SELECTOR after import dpctl, the recalling of get_devices doesn't respect the modified filter.
import dpctl
import os
print("First import dpctl") # this will print all gpu devices
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:gpu"
for d in dpctl.get_devices():
d.print_device_info()
print("============")
print("Does not work on the same process") # this will not print selected device
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
import dpctl
for d in dpctl.get_devices():
d.print_device_info()
print("Works on another process")
import subprocess
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
subprocess.run(["python", "process.py"], capture_output=False)