llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Explicit fill broken when using nested template types

Open j-stephan opened this issue 6 years ago • 29 comments

Example code:

#include <CL/sycl.hpp>

template <typename S>
struct parent
{
    template <typename T>
    struct foo
    {
        T val;
    };

    using Mask = foo<bool>;
    int some_val;
};


auto main() -> int
{
    using type = typename parent<int>::Mask;
    auto queue = cl::sycl::queue{cl::sycl::default_selector{}};
    auto buf = cl::sycl::buffer<type>{cl::sycl::range<1>{1024}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
        cgh.fill(acc, type{true});
    });
    queue.wait();

    return 0;
}

Command line:

clang++ -std=c++2a -fsycl bool.cpp -ftemplate-backtrace-limit=0 -o bool -lOpenCL

Error message:

In file included from <built-in>:1:
/tmp/bool-29f3c3.h:44:43: error: no member named 'parent' in the global namespace
template <> struct KernelInfo<::__fill< ::parent<int>::foo<_Bool>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                                        ~~^
/tmp/bool-29f3c3.h:44:53: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::__fill< ::parent<int>::foo<_Bool>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                                                 ~~~^
/tmp/bool-29f3c3.h:44:60: error: use of undeclared identifier '_Bool'
template <> struct KernelInfo<::__fill< ::parent<int>::foo<_Bool>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {

j-stephan avatar Aug 07 '19 13:08 j-stephan

I tried to work around this by implementing a simple loop in a single_task which produced the same error. So this seems to be tied to the nested templates and not fill.

j-stephan avatar Aug 07 '19 14:08 j-stephan

@j-stephan please, try to eliminate bool from your example. If it doesn't resolve your issue, check if PR #464 helps you.

alexbatashev avatar Aug 07 '19 14:08 alexbatashev

Replacing bool with int produces the following error message:

In file included from <built-in>:1:
/tmp/bool-664cf7.h:39:43: error: no member named 'parent' in the global namespace
template <> struct KernelInfo<::__fill< ::parent<int>::foo<int>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                                        ~~^
/tmp/bool-664cf7.h:39:53: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::__fill< ::parent<int>::foo<int>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                                                 ~~~^
/tmp/bool-664cf7.h:39:20: error: too many template arguments for class template 'KernelInfo'
template <> struct KernelInfo<::__fill< ::parent<int>::foo<int>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                   ^                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/jan/software/sycl/intel/lib/clang/9.0.0/include/CL/sycl/detail/kernel_desc.hpp:55:40: note: template is declared here
template <class KernelNameType> struct KernelInfo {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~        ^
In file included from <built-in>:1:
/tmp/bool-664cf7.h:39:185: error: expected unqualified-id
template <> struct KernelInfo<::__fill< ::parent<int>::foo<int>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
                                                                                                                                                                                        ^
4 errors generated.

I will try out the pull request and report back.

j-stephan avatar Aug 07 '19 14:08 j-stephan

Applying the PR didn't change anything, the error message is identical to the one in my previous comment.

j-stephan avatar Aug 07 '19 15:08 j-stephan

As far as I understand, the problem is in the integration header generation. The function SYCLIntegrationHeader::emitForwardClassDecls in SemaSYCL.cpp does not add enough information to integration header. This problem refers to the types (TemplateArgument::ArgKind::Type), not template template parameters (TemplateArgument::ArgKind::Template) and PR #464 won't solve the problem. In the integration header there is forward declaration about struct foo (template <typename T> struct foo) in global space. There is not any information about parent structure in the integration header.

forward declaration block:

// Forward declarations of templated kernel function types:
template <typename T> struct foo;
template <typename DataT, int Dimensions, cl::sycl::access::mode AccessMode, cl::sycl::access::target AccessTarget, cl::sycl::access::placeholder IsPlaceholder> class __fill;

Kernel specialization:

// Specializations of KernelInfo for kernel function types:
template <> struct KernelInfo<::__fill< ::parent<int>::foo<_Bool>, 1, cl::sycl::access::mode::read_write, cl::sycl::access::target::global_buffer, cl::sycl::access::placeholder::false_t>> {
<...>
}

there is not any info about parent structure

idubinov avatar Aug 08 '19 09:08 idubinov

Is there an ETA for a fix? Unfortunately I can't easily work around this issue in the original codebase without investing a few weeks.

j-stephan avatar Aug 14 '19 17:08 j-stephan

We have the following issue here: fill uses the type of its parameter as part of the kernel name. Since this type is a nested class, the compiler doesn't know how to handle it, so it doesn't emit declaration for the outer class into the integration header. This is why these errors appear. But we can't emit a proper forward declaration for a nested class as it's not allowed by the language. There seems to be a compiler diagnostic for such cases, but it failed to catch this particular one.

@j-stephan There should be a simple workaround for this issue: as long as the nested class doesn't appear in kernel name, you won't get the errors. So you can re-write fill with parallel_for using something else to name the kernel. You can also use -fsycl-unnamed-lambda compiler option. It's an extension which allows to use kernel lambdas without naming them. In this case you can still use fill with nested types as it doesn't require forward declarations.

BTW, according to the SYCL specification, fill's type T must be an integral scalar value or a SYCL vector type. But in the example, the type is neither one of them. So it make sense to use parallel_for instead.

ilyastepykin avatar Aug 20 '19 16:08 ilyastepykin

BTW, according to the SYCL specification, fill's type T must be an integral scalar value or a SYCL vector type. But in the example, the type is neither one of them. So it make sense to use parallel_for instead.

Thanks, I didn't notice I wasn't conforming to the specification here. I will change that ASAP.

There should be a simple workaround for this issue:

Thanks for the workaround options, I'll keep them in mind. The compiler flag looks pretty useful.

For the original codebase I could fortunately apply a very simple "fix" by reinterpreting the type as uint8_t and then doing a memset with fill.

j-stephan avatar Aug 20 '19 18:08 j-stephan

I have the feeling that the SYCL specification should allow any trivially copyable data type.

keryell avatar Aug 21 '19 11:08 keryell

Looks like the restriction to use only scalar/vector types comes from OpenCL's clEnqueueFillBuffer function. For other trivially copyable types fill can be done via a fallback to regular parallel_for. Right now the same thing is done for 2D and 3D buffers as there is no corresponding function in OpenCL. I guess it make sense to allow fill with any trivially copyable types.

ilyastepykin avatar Sep 05 '19 08:09 ilyastepykin

Sorry for the necromancy, but this seems to have be fixed. At least your example compile in run in my version of the compiler

TApplencourt avatar Aug 02 '21 23:08 TApplencourt

@TApplencourt, thanks for confirming. I'm closing this issue. @j-stephan, please, let us know if you still have issues with building your example.

bader avatar Aug 03 '21 07:08 bader

@TApplencourt Which version of the compiler are you using? I just built the current sycl branch and still encounter the error:

$ clang++ --version
clang version 13.0.0 (https://github.com/intel/llvm 7735139bb4d8420fcde692b6eddd2ebf9257efe8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/stepha27/sycl-workspace/llvm/build/bin
$ clang++ -std=c++2a -fsycl bool.cpp -ftemplate-backtrace-limit=0 -o bool -lOpenCL
In file included from <built-in>:1:
/tmp/bool-header-4edb34.h:49:78: error: no member named 'parent' in the global namespace
template <> struct KernelInfo<::sycl::detail::__pf_kernel_wrapper<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>>> {
                                                                           ~~^
/tmp/bool-header-4edb34.h:49:88: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::sycl::detail::__pf_kernel_wrapper<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>>> {
                                                                                    ~~~^
/tmp/bool-header-4edb34.h:49:100: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::sycl::detail::__pf_kernel_wrapper<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>>> {
                                                                                              ~~~~~^
/tmp/bool-header-4edb34.h:49:140: error: expected unqualified-id
template <> struct KernelInfo<::sycl::detail::__pf_kernel_wrapper<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>>> {
                                                                                                                                           ^
/tmp/bool-header-4edb34.h:49:140: error: expected ')'
/tmp/bool-header-4edb34.h:49:139: note: to match this '('
template <> struct KernelInfo<::sycl::detail::__pf_kernel_wrapper<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>>> {
                                                                                                                                          ^
/tmp/bool-header-4edb34.h:65:42: error: no member named 'parent' in the global namespace
template <> struct KernelInfo<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>> {
                                       ~~^
/tmp/bool-header-4edb34.h:65:52: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>> {
                                                ~~~^
/tmp/bool-header-4edb34.h:65:64: error: expected '(' for function-style cast or type construction
template <> struct KernelInfo<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>> {
                                                          ~~~~~^
/tmp/bool-header-4edb34.h:65:65: error: expected unqualified-id
template <> struct KernelInfo<::__fill<::parent<int>::foo<_Bool>, 1, static_cast<::sycl::access::mode>(1026), static_cast<::sycl::access::target>(2014), static_cast<::sycl::access::placeholder>(0)>> {
                                                                ^
9 errors generated.

@bader Unfortunately this doesn't seem to be fixed (at least for me).

j-stephan avatar Aug 04 '21 07:08 j-stephan

@j-stephan, thanks for checking that. I tried it myself and I see the same issue.

Based on the discussion, the original example doesn't seem to satisfy SYCL specification restrictions, so I'm not sure what is expected from the compiler here. I guess at least we can improve the diagnostics to make it clear for the user what is the problem. Right?

But we can't emit a proper forward declaration for a nested class as it's not allowed by the language. There seems to be a compiler diagnostic for such cases, but it failed to catch this particular one.

@erichkeane, @elizabethandrews, @premanandrao, do you think we can fix the diagnostics for this example?

Regarding the types we can use as fill function parameters, SYCL 2020 version of the spec still allows only scalar and vector types and current implementation assumes that, although it must verify that in more user-friendly manner. The runtime can't apply "kernel name type" restrictions support any "device copyable" types, so this will require some additional work. Tagging @romanovvlad, @vladimirlaz, to share thoughts on runtime part improvements.

@intel/dpcpp-specification-reviewers, what are you thoughts on extending the range of allowed types for fill function?

bader avatar Aug 04 '21 08:08 bader

I guess at least we can improve the diagnostics to make it clear for the user what is the problem. Right?

Sure, that would be sufficient (for me).

j-stephan avatar Aug 04 '21 08:08 j-stephan

@j-stephan, thanks for checking that. I tried it myself and I see the same issue.

Based on the discussion, the original example doesn't seem to satisfy SYCL specification restrictions, so I'm not sure what is expected from the compiler here. I guess at least we can improve the diagnostics to make it clear for the user what is the problem. Right?

But we can't emit a proper forward declaration for a nested class as it's not allowed by the language. There seems to be a compiler diagnostic for such cases, but it failed to catch this particular one.

@erichkeane, @elizabethandrews, @premanandrao, do you think we can fix the diagnostics for this example?

There is an opportunity to, yes... I'm a little surprised that what it is complaining about is 'parent' and not 'foo' though. I think @srividya-sundaram went through and did a refactor in this area a while back, but I believe she's still out. This seems worth fixing to me.

It is also really strange that it is forward-declaring 'foo' here....

erichkeane avatar Aug 04 '21 12:08 erichkeane

Min Repro that shows the problem:

template<typename T, typename Func>
__attribute__((sycl_kernel))
void kernel(const Func &f) { f(); }

template<typename T>
struct parent {
  template<typename U>
    struct child {
    };
  //using Mask = child<bool>;
};

int main() {
  kernel<parent<int>::child<bool>>([](){});
}

bash-4.2$ ./bin/clang -cc1 temp.cpp -fsycl-is-device -fsycl-int-header=header.h
bash-4.2$ cat header.h
// This is auto-generated SYCL integration header.

#include <CL/sycl/detail/defines_elementary.hpp>
#include <CL/sycl/detail/kernel_desc.hpp>

// Forward declarations of templated kernel function types:
template <typename U> struct child;

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {

// names of all kernels defined in the corresponding source
static constexpr
const char* const kernel_names[] = {
  "_ZTSN6parentIiE5childIbEE"
};

// array representing signatures of all kernels defined in the
// corresponding source
static constexpr
const kernel_param_desc_t kernel_signatures[] = {
  //--- _ZTSN6parentIiE5childIbEE

};

// Specializations of KernelInfo for kernel function types:
template <> struct KernelInfo<::parent<int>::child<_Bool>> {
  __SYCL_DLL_LOCAL
  static constexpr const char* getName() { return "_ZTSN6parentIiE5childIbEE"; }
  __SYCL_DLL_LOCAL
  static constexpr unsigned getNumParams() { return 0; }
  __SYCL_DLL_LOCAL
  static constexpr const kernel_param_desc_t& getParamDesc(unsigned i) {
    return kernel_signatures[i+0];
  }
  __SYCL_DLL_LOCAL
  static constexpr bool isESIMD() { return 0; }
  __SYCL_DLL_LOCAL
  static constexpr bool callsThisItem() { return 0; }
  __SYCL_DLL_LOCAL
  static constexpr bool callsAnyThisFreeFunction() { return 0; }
};

} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)

bash-4.2$

We should be error-ing based on the use of 'child', which isn't forward-declarable. The bit about the int-header not being valid is likely some assumption about it assuming that the parameter is valid.

erichkeane avatar Aug 04 '21 13:08 erichkeane

@schittir : This is likely a good task for you. See the SYCLKernelNameTypeVisitor, which isn't catching this situation for some reason. The minimal example I provided should be enough to figure out what is missing.

from a quick look at Visit(QualType T) overload, it appears that the class-template-specialization is handled to ONLY check the template args, but it should ALSO check the class template itself.

Additionally, it seems that the check in DiagnoseKernelNameType that splits on whether the tag is a complete definition is insufficient. For some reason child in the example above is not a complete definition (though I'm not sure why, that could use debugging!).

I believe this section is intended to cover the kernel<class N>(...) case, so we probably want to be more specific here, the completeness of the declaration is insufficient, because:

struct s {
       struct u;
   }
....
kernel<s::u>(...);

should ALSO hit that error (which is essentially the case that is happening here for some reason).

I believe @AaronBallman worked with @srividya-sundaram to write this code, so perhaps he can be of help.

erichkeane avatar Aug 04 '21 14:08 erichkeane

@j-sthephan. My bad, it works with the dpcpp compiler packaged with the OneAPI SDK but not with the clang++ compiler.

tapplencourt@foo:~/tmp> cat bool.cpp
#include <CL/sycl.hpp>

template <typename S>
struct parent
{
    template <typename T>
    struct foo
    {
        T val;
    };

    using Mask = foo<bool>;
    int some_val;
};


auto main() -> int
{
    using type = typename parent<int>::Mask;
    auto queue = cl::sycl::queue{cl::sycl::default_selector{}};
    auto buf = cl::sycl::buffer<type>{cl::sycl::range<1>{1024}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
        cgh.fill(acc, type{true});
    });
    queue.wait();

    return 0;
}
tapplencourt@foo:~/tmp> dpcpp bool.cpp
tapplencourt@foo:~/tmp>

One should can investigate dpcpp -v to see what magic happend

TApplencourt avatar Aug 04 '21 19:08 TApplencourt

@j-sthephan. My bad, it works with the dpcpp compiler packaged with the OneAPI SDK but not with the clang++ compiler.

tapplencourt@foo:~/tmp> cat bool.cpp
#include <CL/sycl.hpp>

template <typename S>
struct parent
{
    template <typename T>
    struct foo
    {
        T val;
    };

    using Mask = foo<bool>;
    int some_val;
};


auto main() -> int
{
    using type = typename parent<int>::Mask;
    auto queue = cl::sycl::queue{cl::sycl::default_selector{}};
    auto buf = cl::sycl::buffer<type>{cl::sycl::range<1>{1024}};

    queue.submit([&](cl::sycl::handler& cgh)
    {
        auto acc = buf.get_access<cl::sycl::access::mode::read_write>(cgh);
        cgh.fill(acc, type{true});
    });
    queue.wait();

    return 0;
}
tapplencourt@foo:~/tmp> dpcpp bool.cpp
tapplencourt@foo:~/tmp>

One should can investigate dpcpp -v to see what magic happend

That is going to be the -fsycl-unnamed-lambda extension on drops prior to 7/8 of this year. Before that point, all kernels were treated as 'unnamed', so it went through a different mechanism.

erichkeane avatar Aug 04 '21 19:08 erichkeane

Bingo. The code compiles with it:

tapplencourt@arcticus12:~/tmp> clang++ -fsycl -fsycl-unnamed-lambda bool.cpp
tapplencourt@arcticus12:~/tmp>

As said before, maybe it should not compile because it's not complient but 🤷🏽

TApplencourt avatar Aug 04 '21 19:08 TApplencourt

Bingo. The code compiles with it:

tapplencourt@arcticus12:~/tmp> clang++ -fsycl -fsycl-unnamed-lambda bool.cpp
tapplencourt@arcticus12:~/tmp>

As said before, maybe it should not compile because it's not complient but 🤷🏽

Yep, we should be diagnosing this in the non-unnamed-lambda/kernel case. Unfortunately our diagnostics in this section of code are pretty poor, particularly because differentiating between an inline-implicit declaration and an otherwise non-forward-declarable type is quite difficult.

erichkeane avatar Aug 04 '21 19:08 erichkeane

@bader

@intel/dpcpp-specification-reviewers, what are you thoughts on extending the range of allowed types for fill function?

Yes, the restrictions on the type of T in the handler::fill() function seem weird to me. Currently, we allow only scalar and vector types, but I think we should allow any "device copyable" type instead. Note that this overload of fill() writes into an accessor. Since the underlying type T of a buffer can be any device copyable type, it seems consistent that fill() could also take any device copyable type.

Is there any reason this is difficult to implement?

gmlueck avatar Aug 13 '21 21:08 gmlueck

Yes, the restrictions on the type of T in the handler::fill() function seem weird to me. Currently, we allow only scalar and vector types, but I think we should allow any "device copyable" type instead. Note that this overload of fill() writes into an accessor. Since the underlying type T of a buffer can be any device copyable type, it seems consistent that fill() could also take any device copyable type.

Is there any reason this is difficult to implement?

I don't think so. @romanovvlad, WDYT?

bader avatar Aug 16 '21 08:08 bader

Yes, the restrictions on the type of T in the handler::fill() function seem weird to me. Currently, we allow only scalar and vector types, but I think we should allow any "device copyable" type instead. Note that this overload of fill() writes into an accessor. Since the underlying type T of a buffer can be any device copyable type, it seems consistent that fill() could also take any device copyable type. Is there any reason this is difficult to implement?

I don't think so. @romanovvlad, WDYT?

It should be doable. handler::fill can just enqueue a kernel which does the trick.

romanovvlad avatar Aug 16 '21 22:08 romanovvlad

I opened a PR against the SYCL 2020 spec to remove the type restriction on handler::fill():

https://github.com/KhronosGroup/SYCL-Docs/pull/185

gmlueck avatar Aug 18 '21 13:08 gmlueck

The removal of types restriction for handler::fill is being done here: https://github.com/intel/llvm/pull/6714

romanovvlad avatar Sep 07 '22 12:09 romanovvlad

@romanovvlad, should we link this PR to close the issue when it's merged?

bader avatar Sep 07 '22 12:09 bader

@romanovvlad, should we link this PR to close the issue when it's merged?

Not sure, the original issue is that the integration header is incorrectly generated for the case shown in the first issue description.

romanovvlad avatar Sep 07 '22 12:09 romanovvlad