Minh Quan Ho
Minh Quan Ho
Let call `macro-block` an MC-by-NC block, which is traditionally processed by a call to `macro-kernel` Let call `edge-macro-block` either an ME-by-NC or an MC-by-NE block, where `0 < ME <...
Details: - In some multi-threading schemes, JR_NT and IR_NT may produce idle threads not performing any computation. - This commits detect such situation and implement a collapse of JR/IR loops....
BLIS internal layers are mostly re-cloning and re-aliasing `obj_t a, b, c` each time (`bli_?_front, bli_l3_thread_entry, bli_gemm_int` as well as `bli_?_blk_var?`). This increases the management overhead (`obj_t` aliasing) and consumes...
Details: - In case of pool reinit due to a thread asking for a bigger block, the pool is re-initialized and all blocks are free-ed, including potential ones already checked...
- Add `--enable-dma` option in configure script - DMA-specific control-trees for GEMM and TRSM families - Reference DMA backend implementation based on `bli_pthread` and `memcpy` - Vendor DMA library to...
- Currently, MC (or NC) size of of pool-block is extended by max(MR, NR) to cover possible invalid prefetch (speculative) of micro-kernels, typically for their convenience in the last KC-iteration....
File downloading has no problem, but when I choose to download a folder (as zip), the openned link is broken. Cozy-light then caught an exception and crash : ``` python...