Ensure only public API has default symbol visibility, hide private symbols
This was attempted once in gh-3658 but abandoned. I think the right way of doing it is as pointed out in https://github.com/OpenMathLib/OpenBLAS/pull/3658#issuecomment-1176201131, something like this in a header and then marking public functions with OPENBLAS_API:
#if defined (_WIN32) || defined (__CYGWIN__)
# if defined (__GNUC__)
/* GCC */
# define OPENBLAS_EXPORT __attribute__ ((dllexport))
# define OPENBLAS_IMPORT __attribute__ ((dllimport))
# else
/* MSVC */
# define OPENBLAS_EXPORT __declspec(dllexport)
# define OPENBLAS_IMPORT __declspec(dllimport)
# endif
#else
/* All other platforms. */
# define OPENBLAS_EXPORT __attribute__ ((visibility ("default")))
# define OPENBLAS_IMPORT
#endif
#if defined (OPENBLAS_SHLIB)
# define OPENBLAS_API OPENBLAS_EXPORT
#else
# define OPENBLAS_API OPENBLAS_IMPORT
#endif
This resurfaced again in https://github.com/numpy/numpy/pull/30299, specifically on Alpine (a Musl-based distro) because Musl libc has different behavior from glibc and doesn't support dlopen'ing a shared library with RTLD_LOCAL. This is what NumPy does for its vendored libscipy_openblas.so which hides the (non-suffixed) private symbols well enough on glibc-based distros.
The method above is the standard way of exporting a public API. I think we should try to implement that; it's a fair bit of plumbing but it should be doable and will prevent obscure issues (e.g., see https://github.com/MacPython/openblas-libs/issues/79 for a previous one).
If going this way, would it be worth to also include symbol prefix/suffix with such a macro ? e.g. something like
#define OPENBLAS_API(return_type, base_name, args) OPENBLAS_EXPORT return_type prefix##base_name##suffix args
and a similar definition for internal symbols
#define OPENBLAS_IAPI(return_type, base_name, args) ...
That's a great idea, thanks @mayeut. That may allow getting rid of the hacky objcopy business, and possibly make it easier to achieve other goals like switching from 64_ to the new Netlib convention (_64), or building a single shared library with both LP64 and ILP64 symbols.
I'm not sure internal symbols need the prefix/suffix mangling. Right now there are already unmangled internal symbols, and if their visibility is set to private it shouldn't matter anymore.
I made some progress with this over at https://github.com/MacPython/openblas-libs/pull/241. The code now looks like
__attribute__ ((visibility ("default")))
#ifdef
void NAME(int a, int b) {
#else
void CNAME(int a, int b) {
...code
}
These files are compiled a couple of times with different defines for NAME and CNAME. So in order to get rid of the objcopy things we would have to change the makefile definitions of -DNAME and -DCNAME here https://github.com/OpenMathLib/OpenBLAS/blob/68ff451eccea3e7d7d2afb2588b63552bde7ea89/Makefile.system#L1623
That might be orthagonal to the goal of this issue.
The openblas-libs PR above could be improved to replace the __attribute__ ((visibility ("default"))) with the OPENBLAS_EXPORT above. Note that the cblas.h header is generated, so OPENBLAS_IMPORT may not be needed.
I made quite a bit of progress in MacPython/openblas-libs#241. One stumbling block: the assembly code files use PROLOGUE to define the functions, so I added a hidden directive to all the assembly functions by simply changing the #define PROLOGUE. It turns out there is assembly code for these functions which need to be exposed, they are part of the BLAS standard:
int lsame(char *a, char *b)which compares*a == *bFLOAT cabs(FLOAT *a)which doesfabs(a[0]) + fabs(a[1]).
Both these functions have C-based equivalents in the kernels/generics directory, and I checked on Compiler Explorer that modern compilers generate equivalent assembler to the hand-created assembler in the non-generic directories. So rather than create a special non-hidden assembler PROLOGUE macro, I set the build to use the generic C versions, which can easily be marked with the visible attribute. Would a PR to use the C code (and remove the assembly code kernels and maybe even move the implementations from kernel/generic to /interface) be acceptable?