Element size argument in TTL create tensor API

Open amirBish opened this issue 2 years ago • 1 comments

Hey,

In the official TTL documents it's shown that user could create TTL_tensor with providing:

1- the pointer of the allocated memory
2- TTL_shape
3- TTL_layout.
(without providing the element_size of the element type).

Issue : In the compilation of the file below, when I'm providing the element_size=4 ( e.g. sizeof(int32_t) ) in creating the different TTL tensors I'm checking my output and I'm getting really what I've expected.

However, when not providing the element size I'm getting 1/4 of the output's elements as I've expected and 3/4 total different junk values. So, I'm assuming the DMA transaction didn’t go well.

This is the imp.cl code which I'm trying to compile:

#include "TTL.h"
#include "memrefs.h"
#include "ocl_defs.h"

void _imp_addition_i32___kernel(Memref3D_I32_G* v0, Memref3D_I32_G* v1, Memref3D_I32_G* v2,
                                Memref3D_I32_L* b0, Memref3D_I32_L* b1, Memref3D_I32_L* b2,
                                Memref3D_I32_L* b3){

  // /******************* import v0 -> b0 ***********************/
  TTL_event_t event0 = TTL_get_event();
  TTL_layout_t layout0 = TTL_create_layout(v0->strides[1]/*row_spacing*/, v0->strides[0]/*plane_spacing*/);
  TTL_shape_t shape0 = TTL_create_shape(v0->sizes[0] /*width*/, v0->sizes[1] /*height*/, v0->sizes[2] /*depth*/);

  const TTL_const_ext_tensor_t ext_tensor0 = TTL_create_const_ext_tensor(v0->aligned, shape0, layout0, 4/*elem_size*/);
  const TTL_int_tensor_t int_tensor0 = TTL_create_int_tensor(b0->aligned, shape0, layout0, 4/*elem_size*/);

  TTL_import(int_tensor0, ext_tensor0, &event0);
  TTL_wait(1 /*num_events*/, &event0);

  // /******************* import v1 -> b1 ***********************/
  TTL_event_t event1 = TTL_get_event();
  TTL_layout_t layout1 = TTL_create_layout(v1->strides[1]/*row_spacing*/, v1->strides[0]/*plane_spacing*/);
  TTL_shape_t shape1 = TTL_create_shape(v1->sizes[0] /*width*/, v1->sizes[1] /*height*/, v1->sizes[2] /*depth*/);

  const TTL_const_ext_tensor_t ext_tensor1 = TTL_create_const_ext_tensor(v1->aligned, shape1, layout1, 4/*elem_size*/);
  const TTL_int_tensor_t int_tensor1 = TTL_create_int_tensor(b1->aligned, shape1, layout1, 4/*elem_size*/);

  TTL_import(int_tensor1, ext_tensor1, &event1);
  TTL_wait(1 /*num_events*/, &event1);

  // /******************* export b2 -> v2 ***********************/
  TTL_event_t event2 = TTL_get_event();
  TTL_layout_t layout2 = TTL_create_layout(v2->strides[1]/*row_spacing*/, v2->strides[0]/*plane_spacing*/);
  TTL_shape_t shape2 = TTL_create_shape(v2->sizes[0] /*width*/, v2->sizes[1] /*height*/, v2->sizes[2] /*depth*/);

  const TTL_ext_tensor_t ext_tensor2 = TTL_create_ext_tensor(v2->aligned, shape2, layout2, 4/*elem_size*/);
  const TTL_const_int_tensor_t int_tensor2 = TTL_create_const_int_tensor(b2->aligned, shape2, layout2, 4/*elem_size*/);

  TTL_export(int_tensor2, ext_tensor2, &event2);
  TTL_wait(1 /*num_events*/, &event2);
  return;
}

And in the 'memrefs.h' header declared the memref types: Memref3D_I32_G, Memref3D_I32_L in this way:

typedef struct __attribute__((__packed__))  Memref3D_I32_L 
{                                                                                         
 __local void *allocated;                                                             
 __local int32_t *aligned;                                                               
  int offset;                                                                             
  int sizes[dims];                                                                        
  int strides[dims];                                                                      
}Memref3D_I32_L;

The Memref3D_I32_G is declared also the same but with __local/_global difference.

Would be glad for your help/support. Thanks, Amir Bishara

Oct 15 '23 05:10 amirBish

We have a problem of sorts here - I broke a large patch into smaller patches as I should have done, and broke this feature. It is correct when the whole stack lands - and fixing it is actually difficult here because of the way macros are used.

Can you try with this https://github.com/KhronosGroup/OpenCL-TTL/pull/14 and see how that works for you.

Oct 24 '23 18:10 chrisgearing