cutlass [QST]Question about the picture in documentation `Efficient GEMM in CUDA`

I notice the picture in this manual: https://github.com/NVIDIA/cutlass/blob/main/media/docs/efficient_gemm.md

The partition from global memory to shared memory blocks is easy to understand.

My question comes from the 2nd part: Thread Block Tile.

In the picture, it seems to use an External product, which uses a column in A and a row in B to generate a matrix C.

A.shape (m, 1), B.shape (1, N) -> C.shape (M, N)

Is that the fact?

If so, why is it different from the 1st block partition?

Jan 09 '25 03:01 sleepwalker2017

how to find api key for ollama models ?

Jan 17 '25 15:01 gokulcoder7

For the configuration of ollama's config2.yaml, you can refer to this document: ollama-api If it is a local ollama model, you can fill in any string, but it cannot be an empty string, left unfilled, or YOUR_API_KEY.

Jan 18 '25 03:01 iorisa

This issue has no activity in the past 30 days. Please comment on the issue if you have anything to add.

Feb 18 '25 00:02 github-actions[bot]

Tried this but I got new errror

Feb 22 '25 02:02 gokulcoder7

This issue has no activity in the past 30 days. Please comment on the issue if you have anything to add.

Apr 02 '25 00:04 github-actions[bot]

Due to the lack of updates or replies by the user for a long time, we will close it. Please reopen it if necessary.

May 16 '25 13:05 better629