[BUG] icx toolchain doesn't compile sycl programs correctly
Xmake 版本
2.9.9
操作系统版本和架构
Windows 10
描述问题
Depracated toolchain works fine
add_rules("mode.debug", "mode.release")
set_toolchains("dpcpp")
set_languages("cxx17")
target("intel_sycl")
set_kind("binary")
add_files("src/*.cpp")
But a program compiled with icx toolchain doesn't run
add_rules("mode.debug", "mode.release")
set_toolchains("icx")
set_languages("cxx17")
add_cxflags("/EHsc")
add_cxflags("-fsycl")
target("intel_sycl")
set_kind("binary")
add_files("src/*.cpp")
The problem can be reproduced by building and running the example on this repo, it is one of the sycl code samples provided by intel. https://github.com/mccakit/xmake_sycl
期待的结果
Icx toolchain should be able to compile and run sycl programs
工程配置
add_rules("mode.debug", "mode.release")
set_toolchains("icx")
set_languages("cxx17")
add_cxflags("/EHsc")
add_cxflags("-fsycl")
target("intel_sycl")
set_kind("binary")
add_files("src/*.cpp")
//==============================================================
// Vector Add is the equivalent of a Hello, World! sample for data parallel
// programs. Building and running the sample verifies that your development
// environment is setup correctly and demonstrates the use of the core features
// of SYCL. This sample runs on both CPU and GPU (or FPGA). When run, it
// computes on both the CPU and offload device, then compares results. If the
// code executes on both CPU and offload device, the device name and a success
// message are displayed. And, your development environment is setup correctly!
//
// For comprehensive instructions regarding SYCL Programming, go to
// https://software.intel.com/en-us/oneapi-programming-guide and search based on
// relevant terms noted in the comments.
//
// SYCL material used in the code sample:
// • A one dimensional array of data shared between CPU and offload device.
// • A device queue and kernel.
//==============================================================
// Copyright © Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <sycl/sycl.hpp>
#include <array>
#include <iostream>
#include <string>
#if FPGA_HARDWARE || FPGA_EMULATOR || FPGA_SIMULATOR
#include <sycl/ext/intel/fpga_extensions.hpp>
#endif
using namespace sycl;
// Array size for this example.
size_t array_size = 10000;
// Create an exception handler for asynchronous SYCL exceptions
static auto exception_handler = [](sycl::exception_list e_list) {
for (std::exception_ptr const &e : e_list) {
try {
std::rethrow_exception(e);
}
catch (std::exception const &e) {
#if _DEBUG
std::cout << "Failure" << std::endl;
#endif
std::terminate();
}
}
};
//************************************
// Vector add in SYCL on device: returns sum in 4th parameter "sum".
//************************************
void VectorAdd(queue &q, const int *a, const int *b, int *sum, size_t size) {
// Create the range object for the arrays.
range<1> num_items{size};
// Use parallel_for to run vector addition in parallel on device. This
// executes the kernel.
// 1st parameter is the number of work items.
// 2nd parameter is the kernel, a lambda that specifies what to do per
// work item. the parameter of the lambda is the work item id.
// SYCL supports unnamed lambda kernel by default.
auto e = q.parallel_for(num_items, [=](auto i) { sum[i] = a[i] + b[i]; });
// q.parallel_for() is an asynchronous call. SYCL runtime enqueues and runs
// the kernel asynchronously. Wait for the asynchronous call to complete.
e.wait();
}
//************************************
// Initialize the array from 0 to array_size - 1
//************************************
void InitializeArray(int *a, size_t size) {
for (size_t i = 0; i < size; i++) a[i] = i;
}
//************************************
// Demonstrate vector add both in sequential on CPU and in parallel on device.
//************************************
int main(int argc, char* argv[]) {
// Change array_size if it was passed as argument
if (argc > 1) array_size = std::stoi(argv[1]);
// Create device selector for the device of your interest.
#if FPGA_EMULATOR
// Intel extension: FPGA emulator selector on systems without FPGA card.
auto selector = sycl::ext::intel::fpga_emulator_selector_v;
#elif FPGA_SIMULATOR
// Intel extension: FPGA simulator selector on systems without FPGA card.
auto selector = sycl::ext::intel::fpga_simulator_selector_v;
#elif FPGA_HARDWARE
// Intel extension: FPGA selector on systems with FPGA card.
auto selector = sycl::ext::intel::fpga_selector_v;
#else
// The default device selector will select the most performant device.
auto selector = default_selector_v;
#endif
try {
queue q(selector, exception_handler);
// Print out the device information used for the kernel code.
std::cout << "Running on device: "
<< q.get_device().get_info<info::device::name>() << "\n";
std::cout << "Vector size: " << array_size << "\n";
// Create arrays with "array_size" to store input and output data. Allocate
// unified shared memory so that both CPU and device can access them.
int *a = malloc_shared<int>(array_size, q);
int *b = malloc_shared<int>(array_size, q);
int *sum_sequential = malloc_shared<int>(array_size, q);
int *sum_parallel = malloc_shared<int>(array_size, q);
if ((a == nullptr) || (b == nullptr) || (sum_sequential == nullptr) ||
(sum_parallel == nullptr)) {
if (a != nullptr) free(a, q);
if (b != nullptr) free(b, q);
if (sum_sequential != nullptr) free(sum_sequential, q);
if (sum_parallel != nullptr) free(sum_parallel, q);
std::cout << "Shared memory allocation failure.\n";
return -1;
}
// Initialize input arrays with values from 0 to array_size - 1
InitializeArray(a, array_size);
InitializeArray(b, array_size);
// Compute the sum of two arrays in sequential for validation.
for (size_t i = 0; i < array_size; i++) sum_sequential[i] = a[i] + b[i];
// Vector addition in SYCL.
VectorAdd(q, a, b, sum_parallel, array_size);
// Verify that the two arrays are equal.
for (size_t i = 0; i < array_size; i++) {
if (sum_parallel[i] != sum_sequential[i]) {
std::cout << "Vector add failed on device.\n";
return -1;
}
}
int indices[]{0, 1, 2, (static_cast<int>(array_size) - 1)};
constexpr size_t indices_size = sizeof(indices) / sizeof(int);
// Print out the result of vector add.
for (int i = 0; i < indices_size; i++) {
int j = indices[i];
if (i == indices_size - 1) std::cout << "...\n";
std::cout << "[" << j << "]: " << j << " + " << j << " = "
<< sum_sequential[j] << "\n";
}
free(a, q);
free(b, q);
free(sum_sequential, q);
free(sum_parallel, q);
} catch (exception const &e) {
std::cout << "An exception is caught while adding two vectors.\n";
std::terminate();
}
std::cout << "Vector add successfully completed on device.\n";
return 0;
}
附加信息和错误日志
[ 75%]: linking.release intel_sycl.exe
[100%]: build ok, spent 3.782s
PS C:\Users\cakit\Desktop\intel_sycl> xmake run
Running on device: Intel(R) Iris(R) Xe Graphics
Vector size: 10000
An exception is caught while adding two vectors.
error: execv(C:\Users\cakit\Desktop\intel_sycl\build\windows\x64\release\intel_sycl.exe ) failed(-1073740791)
PS C:\Users\cakit\Desktop\intel_sycl>
We only deal with compilation configuration and build issues. For runtime issues, you can debug them yourself. I'm not sure if it's a compiler problem or something else.
You can check for incorrect compilation flags yourself with xmake -v
It is not a compiler issue, I can compile from the oneAPI shell using icx and run it successfully.
but all builds in xmake fail.
I get stack buffer overruns with all targets built with xmake icx toolchain.
C:\Users\cakit\Desktop\intel_sycl\build\windows\x64\release>gdb-oneapi intel_sycl.exe
GNU gdb (Intel(R) Distribution for GDB* 2025.1.0) 15.2
Copyright (C) 2025 Free Software Foundation, Inc.; (C) 2025 Intel Corp.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For information about how to find Technical Support, Product Updates,
User Forums, FAQs, tips and tricks, and other support information, please visit:
<http://www.intel.com/software/products/support/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from intel_sycl.exe...
(No debugging symbols found in intel_sycl.exe)
(gdb) run
Starting program: C:\Users\cakit\Desktop\intel_sycl\build\windows\x64\release\intel_sycl.exe
[New Thread 9544.0x1e70]
[New Thread 9544.0x41a4]
[New Thread 9544.0x4578]
[New Thread 9544.0x2738]
Running on device: Intel(R) Iris(R) Xe Graphics
Vector size: 10000
An exception is caught while adding two vectors.
gdb: unknown target exception 0xc0000409 at 0x7ff94311286e
Thread 1 received signal ?, Unknown signal.
0x00007ff94311286e in ucrtbase!abort () from C:\WINDOWS\System32\ucrtbase.dll
(gdb) exit
When building I get flag icompatibility warnings
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2025.1.0 Build 20250317
Copyright (C) 1985-2025 Intel Corporation. All rights reserved.
icx: warning: unknown argument ignored in clang-cl '-std=c++17'; did you mean '-Qstd=c++17'? [-Wunknown-argument]
icx: warning: unknown argument ignored in clang-cl: '-fexceptions' [-Wunknown-argument]
icx: warning: unknown argument ignored in clang-cl: '-fcxx-exceptions' [-Wunknown-argument]
icx: warning: unknown argument ignored in clang-cl '-MMD'; did you mean '-QMMD'? [-Wunknown-argument]
icx: warning: unknown argument ignored in clang-cl: '-MF' [-Wunknown-argument]
I get stack buffer overruns with all targets built with xmake icx toolchain.
C:\Users\cakit\Desktop\intel_sycl\build\windows\x64\release>gdb-oneapi intel_sycl.exe GNU gdb (Intel(R) Distribution for GDB* 2025.1.0) 15.2 Copyright (C) 2025 Free Software Foundation, Inc.; (C) 2025 Intel Corp. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-w64-mingw32". Type "show configuration" for configuration details. For information about how to find Technical Support, Product Updates, User Forums, FAQs, tips and tricks, and other support information, please visit: <http://www.intel.com/software/products/support/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from intel_sycl.exe... (No debugging symbols found in intel_sycl.exe) (gdb) run Starting program: C:\Users\cakit\Desktop\intel_sycl\build\windows\x64\release\intel_sycl.exe [New Thread 9544.0x1e70] [New Thread 9544.0x41a4] [New Thread 9544.0x4578] [New Thread 9544.0x2738] Running on device: Intel(R) Iris(R) Xe Graphics Vector size: 10000 An exception is caught while adding two vectors. gdb: unknown target exception 0xc0000409 at 0x7ff94311286e Thread 1 received signal ?, Unknown signal. 0x00007ff94311286e in ucrtbase!abort () from C:\WINDOWS\System32\ucrtbase.dll (gdb) exitWhen building I get flag icompatibility warnings
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2025.1.0 Build 20250317 Copyright (C) 1985-2025 Intel Corporation. All rights reserved. icx: warning: unknown argument ignored in clang-cl '-std=c++17'; did you mean '-Qstd=c++17'? [-Wunknown-argument] icx: warning: unknown argument ignored in clang-cl: '-fexceptions' [-Wunknown-argument] icx: warning: unknown argument ignored in clang-cl: '-fcxx-exceptions' [-Wunknown-argument] icx: warning: unknown argument ignored in clang-cl '-MMD'; did you mean '-QMMD'? [-Wunknown-argument] icx: warning: unknown argument ignored in clang-cl: '-MF' [-Wunknown-argument]
You can copy the compilation commands in xmake -v and then delete and adjust some flags to test them and locate the problem.
I tried without success, someone more knowledgable should look into it, I even tried reverting to the previous commit
I don't have an icx environment to debug it, and I don't think this is a problem with xmake. xmake will not add any additional flags that affect the operation. If the operation fails due to the lack of some flags, you can also configure them yourself through add_cxflags/add_ldflags.
As I said, I have no clue why this fails. You added the windows port to icx toolchain, you should look into it if you have time to spare.
But please look into it, vendor independent GPU programming is something I care about. Eventually dpcpp toolchain will be removed by intel and only icx will remain.
I don't have an icx windows environment to test it now, and I don't have time to debug it recently, so I can only deal with compilation-related issues first. not runtime.
In addition, when I supported icx, I tested it, and at least the basic programs could run normally.