[QST]Question about the picture in documentation `Efficient GEMM in CUDA`
I notice the picture in this manual: https://github.com/NVIDIA/cutlass/blob/main/media/docs/efficient_gemm.md
The partition from global memory to shared memory blocks is easy to understand.
My question comes from the 2nd part: Thread Block Tile.
In the picture, it seems to use an External product, which uses a column in A and a row in B to generate a matrix C.
A.shape (m, 1), B.shape (1, N) -> C.shape (M, N)
Is that the fact?
If so, why is it different from the 1st block partition?
how to find api key for ollama models ?
For the configuration of ollama's config2.yaml, you can refer to this document: ollama-api
If it is a local ollama model, you can fill in any string, but it cannot be an empty string, left unfilled, or YOUR_API_KEY.
This issue has no activity in the past 30 days. Please comment on the issue if you have anything to add.
Tried this but I got new errror
This issue has no activity in the past 30 days. Please comment on the issue if you have anything to add.
Due to the lack of updates or replies by the user for a long time, we will close it. Please reopen it if necessary.