Dave Love

Results 41 comments of Dave Love

I hadn't seen this issue originally. I should have posted full results here before for SKX, in particular, despite the response when I reported on DGEMM originally. Anyhow, there are...

You wrote: > The `generic` implementation will have better cache behavior than > netlib BLAS, That's what I thought. > but will also do packing which will slow things down...

You wrote: > @devinamatthews It may also be that Fortran is better than C :trollface: Of course, but a sometime GNU Fortran maintainer knows how :-/.

You wrote: > Debian tries to help upstream spot problems, not to build software as > fast as possible. In order to build a reliable linux distribution it's > not...

You wrote: > OK, I guess I'm not really clear why you care about the performance of > the BLIS `generic` configuration. Even with cache blocking it will > never...

You wrote: > Field: > > Next time a vendor offers to donate hardware, you might ask for a big SSD > so you can setup a virtual machine for...

You wrote: > @loveshack Returning to the original question: I think one way to make > the "generic" implementation faster would be to add a fully-unrolled > branch and temporary...

You wrote: > @loveshack What architectures in particular are you having a problem with? The Fedora architectures that BLIS doesn't support I think are i686, ppc64, ppc64le, and s390x; there...

You wrote: > @devinamatthews Ah, makes sense. Thanks for clarifying. Yeah, > `generic` doesn't do jack except use `-O3`, which I'm guessing in our > world doesn't do much either....

You wrote: > It might be interesting to see if simd pragmas cause anything better to > happen with the reference kernel. I’ve got a list of all of those,...