dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

ONEAPI_DEVICE_SELECTOR not working under sub-process

Open harborn opened this issue 1 year ago • 4 comments

My environment:

export ONEAPI_DEVICE_SELECTOR="level_zero:gpu"
$ sycl
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:3] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:4] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:5] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26241]

And I have following codes in file test_multi_process.py:

import os
from multiprocessing import Pool
import time
import dpctl

env_var = "ONEAPI_DEVICE_SELECTOR"
backend = "level_zero"
device_type = "gpu"


def get_dev():
    dev_ids = []
    for dev in dpctl.get_devices(backend=backend, device_type=device_type):
        # device filter_string with format: "backend:device_type:relative_id"
        # dev_ids.append(int(dev.filter_string.split(":")[-1]))
        dev_ids.append(dev.filter_string)
    return dev_ids


def set_env_var(dev_id):
    env_val = f"{backend}:{dev_id}"
    print(f"[func] [{os.getpid()}] set {env_var} = {env_val}")
    os.environ[env_var] = env_val


def func(x):
    print(f"[func] [{os.getpid()}] >>>>>>>>>>>>>>>>>>>>>>")
    set_env_var(x)
    env_val = os.environ.get(env_var, None)
    print(f"[func] [{os.getpid()}] x = {x}") 
    print(f"[func] [{os.getpid()}] {env_var} = {env_val}")
    dev_ids = get_dev()
    print(f"[func] [{os.getpid()}] dev_ids = {dev_ids}")
    print(f"[func] [{os.getpid()}] <<<<<<<<<<<<<<<<<<<<<<\n") 


def main():
    env_val = os.environ.get(env_var, None)
    print(f"[main] [{os.getpid()}] {env_var} = {env_val}")
    dev_ids = get_dev()
    print(f"[main] [{os.getpid()}] dev_ids = {dev_ids}")
    with Pool(5) as p:
        p.map(func, [1, 2, 3, 4, 5])

                                                                                                                                                                                                                                                               
if __name__ == '__main__':
    main()

Running test_multi_process.py

python test_multi_process.py

With following logs:

[main] [122421] ONEAPI_DEVICE_SELECTOR = level_zero:gpu
[main] [122421] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']

# sub process 1
[func] [122440] >>>>>>>>>>>>>>>>>>>>>>
[func] [122440] set ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] x = 1
[func] [122440] ONEAPI_DEVICE_SELECTOR = level_zero:1
[func] [122440] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122440] <<<<<<<<<<<<<<<<<<<<<<

# sub process 2
[func] [122441] >>>>>>>>>>>>>>>>>>>>>>
[func] [122441] set ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] x = 2
[func] [122441] ONEAPI_DEVICE_SELECTOR = level_zero:2
[func] [122441] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122441] <<<<<<<<<<<<<<<<<<<<<<

# sub process 3
[func] [122442] >>>>>>>>>>>>>>>>>>>>>>
[func] [122442] set ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] x = 3
[func] [122442] ONEAPI_DEVICE_SELECTOR = level_zero:3
[func] [122442] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122442] <<<<<<<<<<<<<<<<<<<<<<

# sub process 4
[func] [122443] >>>>>>>>>>>>>>>>>>>>>>
[func] [122443] set ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] x = 4
[func] [122443] ONEAPI_DEVICE_SELECTOR = level_zero:4
[func] [122443] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122443] <<<<<<<<<<<<<<<<<<<<<<

# sub process 5
[func] [122444] >>>>>>>>>>>>>>>>>>>>>>
[func] [122444] set ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] x = 5
[func] [122444] ONEAPI_DEVICE_SELECTOR = level_zero:5
[func] [122444] dev_ids = ['level_zero:gpu:0', 'level_zero:gpu:1', 'level_zero:gpu:2', 'level_zero:gpu:3', 'level_zero:gpu:4', 'level_zero:gpu:5']
[func] [122444] <<<<<<<<<<<<<<<<<<<<<<

In the main process, I want to use GPU ID [1,2,3,4,5], and create 5 process to use one of the GPUs in each process.

My question is that: I have set the environment variable ONEAPI_DEVICE_SELECTOR in each sub-process with using only one GPU, but the sub-process could still can see the 6 GPUs. Is that need reload the dpcpp in python codes? Or that ONEAPI_DEVICE_SELECTOR with dpctl can't be used in nested case? Or that ONEAPI_DEVICE_SELECTOR should work with command syc-ls?

harborn avatar Sep 19 '23 01:09 harborn

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

oleksandr-pavlyk avatar Sep 19 '23 20:09 oleksandr-pavlyk

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

Maybe you don't have test the usage of ONEAPI_DEVICE_SELECTOR

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:0.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

SYCL Exception encountered: Error parsing selector string "level_zero:gpu:0"  Too many colons (:)
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:1,3 sycl-ls
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to level_zero:gpu:1,3.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

SYCL Exception encountered: Error parsing selector string "level_zero:gpu:1,3"  Too many colons (:)

harborn avatar Sep 20 '23 02:09 harborn

Please be aware that ONEAPI_DEVICE_SELECTOR string specs and filter-selector string specs are not exactly aligned, and that may be playing tricks on your experiment. See https://intel.github.io/llvm-docs/EnvironmentVariables.html#oneapi-device-selector for the complete specification.

The ONEAPI_DEVICE_SELECTOR requires you to specify the backend and the device type, so I'd expect

$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu:0 sycl-ls

to only show one device, while the syntax used in your script to show all devices, because the incorrectly formed string is ignored:

$ ONEAPI_DEVICE_SELECTOR=level_zero:0 sycl-ls

Do you ever use the environment variable ONEAPI_DEVICE_SELECTOR to filter or use specific device?

harborn avatar Sep 20 '23 02:09 harborn

@oleksandr-pavlyk I faced a similar situation, when modifying ONEAPI_DEVICE_SELECTOR after import dpctl, the recalling of get_devices doesn't respect the modified filter.

import dpctl
import os

print("First import dpctl")  # this will print all gpu devices
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:gpu"
for d in dpctl.get_devices():
    d.print_device_info()

print("============")

print("Does not work on the same process") # this will not print selected device
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
import dpctl
for d in dpctl.get_devices():
    d.print_device_info()

print("Works on another process")
import subprocess
os.environ["ONEAPI_DEVICE_SELECTOR"]="level_zero:1"
subprocess.run(["python", "process.py"], capture_output=False)

xwu99 avatar Sep 20 '23 08:09 xwu99