cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[QST]iterator store interface[store(Fragment &frag,TensorCoord const &tile_offset)] not offset in units of whole tiles

Open zhiyu-deep opened this issue 1 year ago • 13 comments

In file include\cutlass\gemm\warp\mma_tensor_op_tile_iterator.h, In the store interface of iterator, As explained in the comments, "stores a tile with a logical offset in units of whole tiles", However, TensorRef is actually used for offset which is offset in units of elements. store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);

If I understand correctly, it should bahave like the add_tile_offset interface, multiplying tensorCoord by warp shape? ref_.add_coord_offset(tile_offset * make_Coord(Shape::kRow, Shape::kColumn));

zhiyu-deep avatar Jan 20 '24 15:01 zhiyu-deep

could you please point me to the line of code of store and load? There are several in this file.

It is possible that store implementation is buggy, since it is rarely used.

hwu36 avatar Jan 22 '24 20:01 hwu36

I have only noticed the file include\cutlass\gemm\warp\mma_tensor_op_tile iterator.h currently. All store in it have this problem. For example, line 2634. I also think it may be because this interface is rarely used, but there seems to be some logical problem.

zhiyu-deep avatar Jan 23 '24 14:01 zhiyu-deep

line 2634 belongs to accumulator iterator. in this iterator, both store and load are tested by the unit tests. multiplicand iterator only uses load.

store related code is

/// Stores a fragment to memory with additional pointer offset
  CUTLASS_DEVICE
  void store_with_byte_offset(
    Fragment const &frag,                       ///< fragment to store from the tensor
    Index byte_offset) const {                  ///< store a tile with a linear offset

    store_with_pointer_offset(byte_offset / sizeof(Element));
  }

  /// Stores a fragment to memory with logical offset in units of whole tiles.
  CUTLASS_DEVICE
  void store(
    Fragment &frag,                             ///< fragment to store to the tensor
    TensorCoord const &tile_offset) const {     ///< stores a tile with a logical offset in units of whole tiles

    store(frag, tile_offset, 0);
  }

  /// Stores a fragment from memory with logical offset in units of whole tiles.
  CUTLASS_DEVICE
  void store(
      /// fragment to store to the tensor
      Fragment const &frag,
      /// stores a tile with a logical offset in units of whole tiles
      TensorCoord const &tile_offset,
      /// stores a tile with a logical offset AND a pointer offset
      Index pointer_offset) const {
    store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
  }
};

load related code is

  /// Loads a fragment from memory with additional logical offset
  CUTLASS_DEVICE
  void load_with_byte_offset(
    Fragment &frag,                             ///< fragment to load from the tensor
    Index byte_offset) const {                  ///< loads a tile with a linear offset

    load_with_pointer_offset(byte_offset / sizeof(Element));
  }

  /// Loads a fragment from memory with logical offset in units of whole tiles.
  CUTLASS_DEVICE
  void load(
    Fragment &frag,                             ///< fragment to load from the tensor
    TensorCoord const &tile_offset) const {     ///< loads a tile with a logical offset in units of whole tiles

    load(frag, tile_offset, 0);
  }

  /// Loads a fragment from memory with logical offset in units of whole tiles.
  CUTLASS_DEVICE
  void load(
    Fragment &frag,                             ///< fragment to load from the tensor
    TensorCoord const &tile_offset,             ///< loads a tile with a logical offset in units of whole tiles
    Index pointer_offset) const {               ///< loads a tile with a logical offset AND a pointer offset

    load_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
  }

they match pretty well.

hwu36 avatar Jan 23 '24 18:01 hwu36

thanks for your reply, yes, in accumulator iterator, store and load are match, there are several groups of store and load, i have no doubt in load_with_byte_offset and store_with_byte_offset, but in below two interface which are about loading or storing a tile with a logical offset in units of whole tiles:

CUTLASS_DEVICE
  void load(
    Fragment &frag,                             ///< fragment to load from the tensor
    TensorCoord const &tile_offset,             ///< loads a tile with a logical offset in units of whole tiles
    Index pointer_offset) const {               ///< loads a tile with a logical offset AND a pointer offset

    load_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
  }

CUTLASS_DEVICE
  void store(
      /// fragment to store to the tensor
      Fragment const &frag,
      /// stores a tile with a logical offset in units of whole tiles
      TensorCoord const &tile_offset,
      /// stores a tile with a logical offset AND a pointer offset
      Index pointer_offset) const {
    store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
  }

both use ref_.offset(tile_offset), which is offset element by element, but actually, we want to offset warp tile by warp tile, such as:

MmaTensorOpAccumulatorTileIterator &add_tile_offset(TensorCoord const &tile_offset) {

    ref_.add_coord_offset(tile_offset * make_Coord(Shape::kRow, Shape::kColumn));

    return *this;
  }

i think the logic in add_tile_offset is the correct behaviour, tile_offset * make_Coord(Shape::kRow, Shape::kColumn).

If my understanding is wrong, please help point it out, thanks.

zhiyu-deep avatar Jan 24 '24 15:01 zhiyu-deep

i think you are correct. store or load with offset is not used.

hwu36 avatar Jan 24 '24 16:01 hwu36

@zhiyu-deep has your issue been resolved?

mnicely avatar Feb 22 '24 15:02 mnicely

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

zhiyu-deep avatar Feb 22 '24 15:02 zhiyu-deep

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Mar 23 '24 16:03 github-actions[bot]

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

zhiyu-deep avatar Mar 23 '24 16:03 zhiyu-deep

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Apr 22 '24 17:04 github-actions[bot]

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

zhiyu-deep avatar Apr 22 '24 17:04 zhiyu-deep

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar May 22 '24 18:05 github-actions[bot]