[QST]iterator store interface[store(Fragment &frag,TensorCoord const &tile_offset)] not offset in units of whole tiles
In file include\cutlass\gemm\warp\mma_tensor_op_tile_iterator.h,
In the store interface of iterator,
As explained in the comments, "stores a tile with a logical offset in units of whole tiles", However, TensorRef is actually used for offset which is offset in units of elements.
store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
If I understand correctly, it should bahave like the add_tile_offset interface, multiplying tensorCoord by warp shape?
ref_.add_coord_offset(tile_offset * make_Coord(Shape::kRow, Shape::kColumn));
could you please point me to the line of code of store and load? There are several in this file.
It is possible that store implementation is buggy, since it is rarely used.
I have only noticed the file include\cutlass\gemm\warp\mma_tensor_op_tile iterator.h currently. All store in it have this problem.
For example, line 2634. I also think it may be because this interface is rarely used, but there seems to be some logical problem.
line 2634 belongs to accumulator iterator. in this iterator, both store and load are tested by the unit tests. multiplicand iterator only uses load.
store related code is
/// Stores a fragment to memory with additional pointer offset
CUTLASS_DEVICE
void store_with_byte_offset(
Fragment const &frag, ///< fragment to store from the tensor
Index byte_offset) const { ///< store a tile with a linear offset
store_with_pointer_offset(byte_offset / sizeof(Element));
}
/// Stores a fragment to memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
Fragment &frag, ///< fragment to store to the tensor
TensorCoord const &tile_offset) const { ///< stores a tile with a logical offset in units of whole tiles
store(frag, tile_offset, 0);
}
/// Stores a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void store(
/// fragment to store to the tensor
Fragment const &frag,
/// stores a tile with a logical offset in units of whole tiles
TensorCoord const &tile_offset,
/// stores a tile with a logical offset AND a pointer offset
Index pointer_offset) const {
store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
}
};
load related code is
/// Loads a fragment from memory with additional logical offset
CUTLASS_DEVICE
void load_with_byte_offset(
Fragment &frag, ///< fragment to load from the tensor
Index byte_offset) const { ///< loads a tile with a linear offset
load_with_pointer_offset(byte_offset / sizeof(Element));
}
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to load from the tensor
TensorCoord const &tile_offset) const { ///< loads a tile with a logical offset in units of whole tiles
load(frag, tile_offset, 0);
}
/// Loads a fragment from memory with logical offset in units of whole tiles.
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to load from the tensor
TensorCoord const &tile_offset, ///< loads a tile with a logical offset in units of whole tiles
Index pointer_offset) const { ///< loads a tile with a logical offset AND a pointer offset
load_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
}
they match pretty well.
thanks for your reply, yes, in accumulator iterator, store and load are match, there are several groups of store and load, i have no doubt in load_with_byte_offset and store_with_byte_offset, but in below two interface which are about loading or storing a tile with a logical offset in units of whole tiles:
CUTLASS_DEVICE
void load(
Fragment &frag, ///< fragment to load from the tensor
TensorCoord const &tile_offset, ///< loads a tile with a logical offset in units of whole tiles
Index pointer_offset) const { ///< loads a tile with a logical offset AND a pointer offset
load_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
}
CUTLASS_DEVICE
void store(
/// fragment to store to the tensor
Fragment const &frag,
/// stores a tile with a logical offset in units of whole tiles
TensorCoord const &tile_offset,
/// stores a tile with a logical offset AND a pointer offset
Index pointer_offset) const {
store_with_pointer_offset(frag, ref_.offset(tile_offset) + pointer_offset);
}
both use ref_.offset(tile_offset), which is offset element by element, but actually, we want to offset warp tile by warp tile, such as:
MmaTensorOpAccumulatorTileIterator &add_tile_offset(TensorCoord const &tile_offset) {
ref_.add_coord_offset(tile_offset * make_Coord(Shape::kRow, Shape::kColumn));
return *this;
}
i think the logic in add_tile_offset is the correct behaviour, tile_offset * make_Coord(Shape::kRow, Shape::kColumn).
If my understanding is wrong, please help point it out, thanks.
i think you are correct. store or load with offset is not used.
@zhiyu-deep has your issue been resolved?
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
这是来自QQ邮箱的假期自动回复邮件。 您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.