blis
blis copied to clipboard
Insert timing infrastructure into BLIS
The control tree can (in principle) be retrofitted with fields that can be used by threads to track timing information (within each loop and within packing functions) that can be used later on for performance analysis. Thanks to Devangi Parikh for suggesting this.