Element size argument in TTL create tensor API
Hey,
In the official TTL documents it's shown that user could create TTL_tensor with providing:
1- the pointer of the allocated memory
2- TTL_shape
3- TTL_layout.
(without providing the element_size of the element type).
Issue : In the compilation of the file below, when I'm providing the element_size=4 ( e.g. sizeof(int32_t) ) in creating the different TTL tensors I'm checking my output and I'm getting really what I've expected.
However, when not providing the element size I'm getting 1/4 of the output's elements as I've expected and 3/4 total different junk values. So, I'm assuming the DMA transaction didn’t go well.
This is the imp.cl code which I'm trying to compile:
#include "TTL.h"
#include "memrefs.h"
#include "ocl_defs.h"
void _imp_addition_i32___kernel(Memref3D_I32_G* v0, Memref3D_I32_G* v1, Memref3D_I32_G* v2,
Memref3D_I32_L* b0, Memref3D_I32_L* b1, Memref3D_I32_L* b2,
Memref3D_I32_L* b3){
// /******************* import v0 -> b0 ***********************/
TTL_event_t event0 = TTL_get_event();
TTL_layout_t layout0 = TTL_create_layout(v0->strides[1]/*row_spacing*/, v0->strides[0]/*plane_spacing*/);
TTL_shape_t shape0 = TTL_create_shape(v0->sizes[0] /*width*/, v0->sizes[1] /*height*/, v0->sizes[2] /*depth*/);
const TTL_const_ext_tensor_t ext_tensor0 = TTL_create_const_ext_tensor(v0->aligned, shape0, layout0, 4/*elem_size*/);
const TTL_int_tensor_t int_tensor0 = TTL_create_int_tensor(b0->aligned, shape0, layout0, 4/*elem_size*/);
TTL_import(int_tensor0, ext_tensor0, &event0);
TTL_wait(1 /*num_events*/, &event0);
// /******************* import v1 -> b1 ***********************/
TTL_event_t event1 = TTL_get_event();
TTL_layout_t layout1 = TTL_create_layout(v1->strides[1]/*row_spacing*/, v1->strides[0]/*plane_spacing*/);
TTL_shape_t shape1 = TTL_create_shape(v1->sizes[0] /*width*/, v1->sizes[1] /*height*/, v1->sizes[2] /*depth*/);
const TTL_const_ext_tensor_t ext_tensor1 = TTL_create_const_ext_tensor(v1->aligned, shape1, layout1, 4/*elem_size*/);
const TTL_int_tensor_t int_tensor1 = TTL_create_int_tensor(b1->aligned, shape1, layout1, 4/*elem_size*/);
TTL_import(int_tensor1, ext_tensor1, &event1);
TTL_wait(1 /*num_events*/, &event1);
// /******************* export b2 -> v2 ***********************/
TTL_event_t event2 = TTL_get_event();
TTL_layout_t layout2 = TTL_create_layout(v2->strides[1]/*row_spacing*/, v2->strides[0]/*plane_spacing*/);
TTL_shape_t shape2 = TTL_create_shape(v2->sizes[0] /*width*/, v2->sizes[1] /*height*/, v2->sizes[2] /*depth*/);
const TTL_ext_tensor_t ext_tensor2 = TTL_create_ext_tensor(v2->aligned, shape2, layout2, 4/*elem_size*/);
const TTL_const_int_tensor_t int_tensor2 = TTL_create_const_int_tensor(b2->aligned, shape2, layout2, 4/*elem_size*/);
TTL_export(int_tensor2, ext_tensor2, &event2);
TTL_wait(1 /*num_events*/, &event2);
return;
}
And in the 'memrefs.h' header declared the memref types: Memref3D_I32_G, Memref3D_I32_L in this way:
typedef struct __attribute__((__packed__)) Memref3D_I32_L
{
__local void *allocated;
__local int32_t *aligned;
int offset;
int sizes[dims];
int strides[dims];
}Memref3D_I32_L;
The Memref3D_I32_G is declared also the same but with __local/_global difference.
Would be glad for your help/support. Thanks, Amir Bishara
We have a problem of sorts here - I broke a large patch into smaller patches as I should have done, and broke this feature. It is correct when the whole stack lands - and fixing it is actually difficult here because of the way macros are used.
Can you try with this https://github.com/KhronosGroup/OpenCL-TTL/pull/14 and see how that works for you.