[DOC]: Clarification the use of `ct.num_tiles` in the first example that everyone will read, i.e., Matmul.py
How would you describe the priority of this documentation request?
High
Please provide a link or source to the relevant docs
https://github.com/NVIDIA/cutile-python/blob/main/samples/MatMul.py#L61-L65
Describe the problems in the documentation
from matrix A's shape, assuming A's shape is conceptually (M_tiles, K_tiles), # and then implicitly performs ceiling division by
tkto get the number of K-tiles.
If A's shape is already tiled and has K_tiles, why do we need to perform ceil division to get K_tiles. The comment seems to be incorrect comment?
A is MxK global tensor, tiled using tmxtk with ct.num_tiles along axis=1. ct.num_tiles returns i32 so we have to specify axis if we have multi-dimensional tensor and tiling.
The comment and code in the BatchMatmul.py is better. The comment reads correct and the code line is better as we don't pass the modes that we don't care for in computing num_k_tiles.
num_k_tiles = ct.cdiv(A.shape[2], tk)
On the same lines, it seems I cannot use ct.num_tiles to write something like below:
num_m_tiles, num_k_tiles = ct.num_tiles(A, shape=(tm, tk))
or
_, num_k_tiles = ct.num_tiles(A, shape=(tm, tk))
(Optional) Propose a correction
No response
Contributing Guidelines
- [x] I agree to follow cuTile Python's contributing guidelines
- [x] I have searched the open documentation and have found no duplicates for this documentation request