cutlass [QST] Adding a flag in Tensor Ref Class

[QST] Adding a flag in Tensor Ref Class

Open IzanCatalan opened this issue 3 weeks ago • 4 comments

What is your question? Hi, I want to define an extra parameter in Tensor_ref class. In my case a flag, in the form of an integer pointer to be accesses when the convolution is performed in https://github.com/NVIDIA/cutlass/blob/24f991e87930e1159f1f5a47e329d43bcfbd76b9/include/cutlass/conv/kernel/implicit_gemm_convolution.h:

  /// Pointer
  Element* ptr_;
  int* check;

The flag is to check if the GPU uses the tensor. So I have also modified the constructors of the class (in my case, as it is only intended to be used in GPU, I use cudaMalloc:

  /// Constructs a TensorRef with a pointer and layout object.
  CUTLASS_HOST_DEVICE
  TensorRef(
    Element *ptr,                   ///< pointer to start of tensor
    Layout const &layout            ///< layout object containing stride and mapping function
  ):
    ptr_(ptr), layout_(layout), check(nullptr){ 
      cudaMalloc((void**)&check, sizeof(int)); 
      cudaMemset(check, 0, sizeof(int)); // Inicializar en 0
    }


  /// Converting constructor from TensorRef to non-constant data.
  template<typename _Magic = int>
  CUTLASS_HOST_DEVICE
  TensorRef(
    NonConstTensorRef const &ref,              ///< TensorRef to non-const data
    ///SFINAE trick to avoid creating a copy-constructor when Element_ is already non-const
    _Magic magic = (typename platform::enable_if< ! platform::is_same<NonConstTensorRef, TensorRef<Element_, Layout_> >::value, _Magic>::type)0
  ):
    ptr_(ref.data()), layout_(ref.layout()), check(nullptr){ 
      cudaMalloc((void**)&check, sizeof(int)); 
      cudaMemset(check, 0, sizeof(int)); // Inicializar en 0
    }

Finally, I have added a new function similar to data() to pass the pointer to the parameters of the convolution in a similar way as is done in implicit_gemmem_convolution.h:

    /// Returns the check object
  CUTLASS_HOST_DEVICE
  int * isChecked() const {return check;}

As a result, I have also modified different parts of the convolution kernel assigning in Params the pointer to a new value first_call: https://github.com/NVIDIA/cutlass/blob/24f991e87930e1159f1f5a47e329d43bcfbd76b9/include/cutlass/conv/kernel/implicit_gemm_convolution.h:

struct Params {
    ...
    int *first_call;
    ...

    //
    // Methods
    //

    CUTLASS_HOST_DEVICE
    Params(): swizzle_log_tile(0), gemm_k_iterations(0) { }

    /// 
    CUTLASS_HOST_DEVICE
    Params(
      Arguments const &args,
      int *semaphore = nullptr
    ):
     ...
      first_call(args.ref_B.isChecked()),
     ...
    {

Later, in operator() function, as the both *ptr and *check are not const pointers in Tensor_Ref class, they can be accessed.

Only ptr_B works fine, but the program is suddenly stacked when I access and modify first_call.

I am executing example 16 to check the implementation.

This is the code when I modify the flag using first call parameter:

  void operator()(Params const &params, SharedStorage &shared_storage) {
    int threadId = threadIdx.x + blockIdx.x * blockDim.x;
    if (threadId == 0) {
      if (*params.first_call == 0){
      *params.first_call = 1;
      }
    }

First_call is modified because I have printed after this piece of code, but when It arrives to line 343 the process it gets stacked. It uses the GPU. I have checked that but don't know why it stops there.

I think that is perhaps some kind of memory-free problem due to how I reserve memory in the constructor of the Tensor_Ref Class. Maybe it is not the proper way of doing it because I don't do any free method on the new integer pointer.

Should I modify host tensor and device_memory classes which are the ones used to define tensors from the host as is described in example [16]? :

cutlass::HostTensor<ElementInputB, LayoutInputB> tensor_b(options.filter_size);

Any help would be appreciated.

Izan.

Feb 05 '25 17:02 IzanCatalan

cutlass cutlass copied to clipboard

[QST] Adding a flag in Tensor Ref Class

cutlass
cutlass copied to clipboard