vexcl
                                
                                 vexcl copied to clipboard
                                
                                    vexcl copied to clipboard
                            
                            
                            
                        Support for custom kernels vector_view
It would be nice, if one could use vector views with custom kernels.
When calling such a function one needs to pass pointers instead of vectors. The vex::vectors provides operator()(int d) which provides the pointer to to the portion of the array on device d. vex::vector_view also has such an operator, but what it returns does not seem to be a valid pointer to the underlying data: when I tried to access a vectors wrapped in a view this way the kernel just crashed. Unfortunately I could not find any documentation on what vex::vector_view::operator()(int) is supposed to do. (i my case the sliced ranged is continuous (not stride), so pointer to the data can be computed).
It is not possible to send a vector_view to a custom kernel as a pointer, because a vector_view is an expression that is evaluated on the fly by vexcl.
You can assign the expression to a temporary vector, and get the pointer to that.
Or, instead of a custom kernel, you could use custom function that returns void:
#include <vexcl/vexcl.hpp>
int main() {
    vex::Context ctx(vex::Filter::Env);
    vex::vector<int> x(ctx, 16);
    vex::vector<int> y(ctx, 8);
    x = vex::element_index();
    vex::slicer<2> slice(vex::extents[8][2]);
    VEX_FUNCTION(void, assign, (int, idx)(int*, out)(int, in),
            out[idx] = in;
    );
    vex::eval(
            assign(vex::element_index(), raw_pointer(y), slice[vex::_][0](x)),
            y.queue_list(),
            y.partition());
    std::cout << "y = " << y << std::endl;
}
Here the function gets view of x in parameter in, and writes it to output.
It is good to know, that a VEX_FUNCTION can use a view. Maybe I could use this in some places.
- in one place I need to use a kernel, because using one thread per item would not be efficient. The code uses one warp block per item and performs a reduction over threads in a warp before writing the item
- In another place, I use a kernel just because it allows me include a device function in with the code. This function is called by the kernel and is inserted at runtime. There does not seem to be a way to define a VEX_FUNCTION from a string at runtime. Some really ugly macros would be required to do it at compile time.
I guess, vexcl will insert the VEX_FUNCTION as a device function into some generic kernel which handles access to the view and call the function for each item. Would it be possible to expose this interface to custom kernels?
There does not seem to be a way to define a VEX_FUNCTION from a string at runtime.
There is VEX_FUNCTION_S:
https://github.com/ddemidov/vexcl/blob/192137a2c1f74de97dc2859d8af76b822a9a4dd6/vexcl/function.hpp#L205-L210
Would it be possible to expose this interface to custom kernels?
You can get function definition from a VEX_FUNCTION as a string using define() method:
#include <vexcl/vexcl.hpp>
int main() {
    vex::Context ctx(vex::Filter::Env);
    VEX_FUNCTION(void, assign, (int, idx)(int*, out)(int, in),
            out[idx] = in;
    );
    vex::backend::source_generator src(ctx.queue(0));
    assign.define(src);
    std::cout << src.str() << std::endl;
}
With an OpenCL backend, this results in:
void assign
(
  int idx,
  global int * out,
  int in
)
{
  out[idx] = in;
}
There is VEX_FUNCTION_S: "The body of the function is passed as a string literal or a static string expression."
Ok, apparently, in this macro body can be anything that can be ostreamed, even expressions like
begin << middle << end
(not sure if this is meant by "static string expression"). Thanks. However, this still only works with literals. It would not work if e.g. middle was a local variable in calling function, as this code is inserted into a member function of the generated class. Also: I need the function object to be a member of some class, not just a local variable, which means I need to know of the object I will get before using VEX_FUNCTION_S, so I can declare the variable. All of this works with a custom kernel.
You can get function definition from a VEX_FUNCTION as a string using define() method:
This only returns the function definition, but I know, that I can find the generated kernel in vexcl's staging area. What I would like to have is a documented interface that defines how a vector_view is passed to a kernel and how it should be accessed.
not sure if this is meant by "static string expression"
This should be an expression that always returns the same string. Also, VEX_FUNCTION macro unpacks into something like
struct <name>_function_type { ... } const <name>;
so it can be included into your classes like this:
#include <vexcl/vexcl.hpp>
struct test {
    static std::string body() {
        return "return x;";
    }
    VEX_FUNCTION_S(int, foo, (int, x), body());
};
int main() {
    vex::Context ctx(vex::Filter::Env);
    test t;
    vex::backend::source_generator src(ctx.queue(0));
    t.foo.define(src);
    std::cout << src.str() << std::endl;
}
This only returns the function definition, but I know, that I can find the generated kernel in vexcl's staging area
You could just prepend the function definition to your kernel definition:
    VEX_FUNCTION(int, foo, (int, x), return x;);
    vex::backend::source_generator src(ctx.queue(0));
    foo.define(src);
    vex::backend::kernel test(ctx.queue(0), src.str() + VEX_STRINGIZE_SOURCE(
        kernel void test(ulong n, global int *x) {
            for(ulong i = get_global_id(0); i < n; i += get_global_size(0)) {
                x[i] = foo(x[i]);
            }
        }), "test");
What I would like to have is a documented interface that defines how a vector_view is passed to a kernel and how it should be accessed.
That is really not possible. vector_view is potentially a complex expression (something like slice[_](3 * cos(x)) is perfectly acceptable), so there is no generic way to pass the results to a custom kernel without a temporary vector.