llvm
llvm copied to clipboard
`llvm-foreach` takes 100% cpu usage
Describe the bug
While building SYCL code with Intel oneAPI, I noticed that llvm-foreach is almost always sitting at 100% cpu usage.
top:
%Cpu(s): 8.5 us, 5.0 sy, 0.0 ni, 85.8 id, 0.1 wa, 0.0 hi, 0.6 si, 0.0 st
MiB Mem : 64023.7 total, 27107.4 free, 6165.8 used, 30750.5 buff/cache
MiB Swap: 32958.0 total, 32958.0 free, 0.0 used. 53965.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
99440 fwyzard 20 0 4540 2176 2048 R 99.7 0.0 5:42.07 llvm-foreach
100326 fwyzard 20 0 309368 274324 48756 R 99.3 0.4 0:05.92 ocloc
ps -xf:
98325 pts/2 S+ 0:00 | | | \_ /opt/intel/oneapi/compiler/2024.1/bin/compiler/clang++ @/tmp/icpx0294253703WMgiHH/icpxargD9hFos
99440 pts/2 R+ 5:42 | | | \_ /opt/intel/oneapi/compiler/2024.1/bin/compiler/llvm-foreach --out-ext=out --in-file-list=/tmp/icpx-ff969312fd/Activemask-tgllp-63b648.txt --in-replace=/tmp/icpx-ff969312fd/Activemask-tgllp-63b648.txt --ou
100326 pts/2 R+ 0:06 | | | \_ /usr/bin/ocloc -output /tmp/Activemask-tgllp-e57dbd-65fea9.out -file /tmp/icpx-ff969312fd/Activemask-tgllp-63b648-0e09e1.spv -output_no_suffix -spirv_input -device tgllp -options -g -cl-opt-disable
This seems to happen for any backend. I've observed this consistently with oneAPI 2024.0 (based on LLVM 17) and 2024.2 (based on LLVM 19), running on Ubuntu Linux 22.04.
To reproduce
Build any complex program with ahead-of-time compilation for multiple backends, e.g. multiple Intel GPUs.
Environment
- OS: Ubuntu Linux 22.04
- Target device and vendor: any backend.
- DPC++ version:
Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711) - Dependencies version:
sycl-ls --verbose
[opencl:cpu][opencl:0] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics OpenCL 3.0 NEO [24.22.29735.27]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3050 Ti Laptop GPU 8.6 [CUDA 12.6]
Platforms: 4
Platform [#1]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
Vendor : Intel(R) Corporation
Driver : 2024.18.7.0.11_160000
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
info::device::sub_group_sizes: 4 8 16 32 64
Platform [#2]:
Version : OpenCL 3.0
Name : Intel(R) OpenCL Graphics
Vendor : Intel(R) Corporation
Devices : 1
Device [#1]:
Type : gpu
Version : OpenCL 3.0 NEO
Name : Intel(R) UHD Graphics
Vendor : Intel(R) Corporation
Driver : 24.22.29735.27
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
info::device::sub_group_sizes: 8 16 32
Platform [#3]:
Version : 1.3
Name : Intel(R) Level-Zero
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 1.3
Name : Intel(R) UHD Graphics
Vendor : Intel(R) Corporation
Driver : 1.3.29735
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_graph
info::device::sub_group_sizes: 8 16 32
Platform [#4]:
Version : CUDA 12.6
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.6
Name : NVIDIA GeForce RTX 3050 Ti Laptop GPU
Vendor : NVIDIA Corporation
Driver : CUDA 12.6
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering
info::device::sub_group_sizes: 32
default_selector() : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
gpu_selector() : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(gpu) : gpu, Intel(R) Level-Zero, Intel(R) UHD Graphics 1.3 [1.3.29735]
custom_selector(cpu) : cpu, Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
custom_selector(acc) : No device of requested type available. Please chec...
Additional context
No response
@fwyzard, the problem is ocloc tool. llvm-foreach just a simple launcher runs commands from a file and waits for them to complete. You can check the logic here - it's ~200 lines of code.
NOTE: ocloc tool is being developed in https://github.com/intel/intel-graphics-compiler/, so I would transfer this issue there.
@bader while it would definitely be nice if ocloc were faster, the issue is that llvm-foreach itself takes 100% cpu, in addition to ocloc taking up 100% cpu (on another core4).
Instead of tightly looping, would it be possible to make llvm-foreach sleep until a subprocess complete ?
Or, at least, something like sleeping 100ms between each check ?
I think we are going to new remove this tool soon. We are refactoring the compilation process for offload code and new approach won't use this tool or similar approach to detect the task completion. @asudarsa, @maksimsab, @sarnex, FYI.
@ivorobts FYI
I think we are going to new remove this tool soon. We are refactoring the compilation process for offload code and new approach won't use this tool or similar approach to detect the task completion. @asudarsa, @maksimsab, @sarnex, FYI.
Yes. We are in the process of adding support for '--offload-new-driver' flag that can be used for SYCL offloading apps. This will trigger a compilation flow that will not use 'llvm-foreach' tool. For 'ocloc' issue, https://github.com/intel/intel-graphics-compiler/ will be a better place to report this. However, this behavior of nearly 100% utilization of cpu during the AOT stage is not something we expect. I will try to confirm this on my end.
Thanks for the report.
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@asudarsa, could you please take one of the following actions:
- provide an update if you have any
- unassign yourself if you're not looking / going to look into this issue
- mark this issue with the 'confirmed' label if you have confirmed the problem/request and our team should work on it
- close the issue if it has been resolved
- take any other suitable action.
Thanks!
I am looking into this now. Thanks