Devin Matthews comments

Results 264 comments of


                                            Devin Matthews

Netlib BLAS test failures should cause CI failure

> I recommend that we insert a comment somewhere (in the .appveyor.yml file?) that will remind future readers of this issue. Done

Netlib BLAS test failures should cause CI failure

Apparently *all* test failures are ignored in the Makefile (prepended with "- "). @fgvanzee can you remind me what the rationale for this is? Makes it hard for other people...

Does level 1 and level 2 APIs have multi-threads support in BLIS

No, L1 and L2 operations are still single-threaded.

slow generic implementation

The `generic` implementation will have better cache behavior than netlib BLAS, but will also do packing which will slow things down for small and medium-sized matrices. It's not totally clear...

slow generic implementation

OK, I guess I'm not really clear why you care about the performance of the BLIS `generic` configuration. Even with cache blocking it will never be "high performance".

slow generic implementation

@loveshack Returning to the original question: I think one way to make the "generic" implementation faster would be to add a fully-unrolled branch and temporary storage of C to the...

slow generic implementation

@loveshack What architectures in particular are you having a problem with?

slow generic implementation

@fgvanzee I was mostly talking about the actual `generic` configuration vs. the reference kernel being used in a particular configuration.

slow generic implementation

> i686, ppc64, ppc64le, and s390x @loveshack For which of those architectures can we assume vectorization with the default flags?

slow generic implementation

@fgvanzee I would suggest: 1. Changing the default MR and NR to 4x16, 4x8, 4x8, 4x4 (sdcz). 2. Rewriting the reference gemm kernel to: a. be row-major, b. be fully...