OpenBLAS
OpenBLAS copied to clipboard
Lacking quad precision support
Support for quad precision seems to be broken or maybe entirely missing. Setting QUAD_PRECISION=1 in Makefile.rule and running make results in compilation error caused by missing common_quad.h.
Could be that this has not been tried in a long time - I see there is a file named common_q.h with what seems to be matching contents, if you are feeling adventurous you could try renaming it. (From the git history, this apparent misnaming must have already been present in GotoBLAS2-1.13 - perhaps at some point Mr.Goto had to use an operating system that did not support longer filenames ?)
Actually common_q.h is included from common_macro.h - I am less sure now that this is the elusive common_quad.h, perhaps it would be more instructive to just comment out its inclusion in common.h and see what if any errors this causes. Interestingly this file was definitely missing in both GotoBLAS2 versions 1.08 and 1.13, the earlier GotoBLAS did not contain any code for QUAD_PRECISION at least in versions 1.00 and 1.07. (Unfortunately I do not have any versions of GotoBLAS2 earlier than 1.08, but it does look like even the man himself did not test this part of his code)
interface/*
actually deals only with single and double precision, what may be perceived as quad precision support is actually early development stage abandoned long ago
Indeed looks like there is only some very sketchy support in level3.c, and prototypes for kernel functions qconjg,qcabs, of which only qconjg has assembly implementations for x86 and x86_64. A naive build attempt fails due to various clashes between "long double" and the xdouble struct of two long doubles that Goto added for "QUAD_PRECISION". Time to change at least the comment accompanying this option in Makefile.rule.
Would be great to see the introduction of quad precision (even without optimization as a first step).
- it would help taking advantage of those architectures that offer native support for quadruple precision (e.g., ARM 64).
- it would help push numpy to also introduce a proper quad precision type and get rid of that weird, slow and not very useful intel-ism which the 80 bits float aligned on 128 bits is.
- from a practical point of view it would allow many code-bases to develop test cases to be run every now and then at a higher precision, to verify the influence of numeric noise.
First - aarch64 ABI definition allows passing float128 arguments, there is no silicon to do this, all is emulation Second - Numpy uses generic BLAS API, i.e one from NetLIB reference LAPACK, who shall name functions and provide reference API first. Third - from a practical point of view it will be 100x slower integer emulation (gfortran), or clamped values as you describe (gcc)
To summarize - in 3 years since this request (or 10+ years since gotoblas abandoned development of quad support) nothing has changed in the area of practical portable supportability of 128bit float.