Jeff Hammond

http://jeffhammond.github.io/ [email protected]

@nvidia Helsinki, Finland HPC software @NVIDIA in 🇫🇮. Previously @Intel HPC, @argonne-lcf w/ Blue Gene and MPI. PhD in Chemistry from @uchicago for work on @nwchemgit. He/him/hän.

Results 414 comments of


                                            Jeff Hammond

Add support for float16 (half-precision floats) and related operations such as hgemm()

@fgvanzee I'd like to recant my prior comment in https://github.com/flame/blis/issues/234#issuecomment-405753540. For quantum chemistry, float16 might end up being more interesting. We are still studying this but it is ideal to...

Add support for float16 (half-precision floats) and related operations such as hgemm()

Intel published the BF16 ISA in the April 2019 update (319433-036) of the [Intel® Architecture Instruction Set Extensions and Future Features Programming Reference](https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf). There is an unofficial synopsis for those...

Add support for float16 (half-precision floats) and related operations such as hgemm()

> I'm trying to imagine what could have changed (what observations you could > have made) that would flip the polarity on this issue. (You need those > extra three...

Add support for float16 (half-precision floats) and related operations such as hgemm()

@jacobgorm https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point also says > __fp16 is supported on every target, as it is purely a storage format; see below. and > __fp16 is a storage and interchange format only....

Add support for float16 (half-precision floats) and related operations such as hgemm()

@jacobgorm Yes, of course, but since I work for Intel, I have an interest in implementing something that is not restricted to ARM architectures 😃 In any case, since BLIS...

Add support for float16 (half-precision floats) and related operations such as hgemm()

@amirgholami BLIS doesn't support GPUs but TF32 is just a form of 19-bit floating-point with 32b data. In the absence of hardware support, there is no upside versus SGEMM. In...

add method to query HW size

To be clear, final version will not abort. But I could not figure out how to set BLIS threading variables correctly. That's why there's a preprocessing warning saying "please help...

add method to query HW size

I agree. We can do slightly better than serialization if nested is enabled but I don't think that's an important thing to spend time on.

add method to query HW size

the affinity mask HW thread count is compared to the user-specified SW thread count.

add method to query HW size

I've asked @egaudry to test with his application, which may be settings affinity masks via MPI or HWLOC. But I can say that it works to detect oversubscription for me....

‹
1
2
...
18
19
20
21
22
23
24
...
41
42
›