Jeff Hammond
Jeff Hammond
@fgvanzee I'd like to recant my prior comment in https://github.com/flame/blis/issues/234#issuecomment-405753540. For quantum chemistry, float16 might end up being more interesting. We are still studying this but it is ideal to...
Intel published the BF16 ISA in the April 2019 update (319433-036) of the [Intel® Architecture Instruction Set Extensions and Future Features Programming Reference](https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf). There is an unofficial synopsis for those...
> I'm trying to imagine what could have changed (what observations you could > have made) that would flip the polarity on this issue. (You need those > extra three...
@jacobgorm https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point also says > __fp16 is supported on every target, as it is purely a storage format; see below. and > __fp16 is a storage and interchange format only....
@jacobgorm Yes, of course, but since I work for Intel, I have an interest in implementing something that is not restricted to ARM architectures 😃 In any case, since BLIS...
@amirgholami BLIS doesn't support GPUs but TF32 is just a form of 19-bit floating-point with 32b data. In the absence of hardware support, there is no upside versus SGEMM. In...
To be clear, final version will not abort. But I could not figure out how to set BLIS threading variables correctly. That's why there's a preprocessing warning saying "please help...
I agree. We can do slightly better than serialization if nested is enabled but I don't think that's an important thing to spend time on.
the affinity mask HW thread count is compared to the user-specified SW thread count.
I've asked @egaudry to test with his application, which may be settings affinity masks via MPI or HWLOC. But I can say that it works to detect oversubscription for me....