Octavian.jl icon indicating copy to clipboard operation
Octavian.jl copied to clipboard

Feature request: use libblastrampoline (LBT) to select Octavian as the BLAS

Open DilumAluthge opened this issue 4 years ago • 11 comments

See also:

  1. https://github.com/staticfloat/libblastrampoline
  2. https://github.com/JuliaLang/julia/pull/39455

DilumAluthge avatar Feb 17 '21 18:02 DilumAluthge

Currently, Octavian doesn't implement all of the BLAS operations. We only have GEMM right now. But still, it could be a very cool proof-of-concept if we could figure out how to hook up Octavian to LBT. GEMM would use Octavian, and I guess other operations would just throw a runtime error.

DilumAluthge avatar Feb 17 '21 18:02 DilumAluthge

@ViralBShah @staticfloat Is there any documentation on how we might implement this for Octavian?

DilumAluthge avatar Feb 17 '21 18:02 DilumAluthge

Right - so Octavian is a pure Julia library. If we want to use it as a replacement for the actual BLAS, it needs to expose an interface so that it can be called with ccall, just like any other BLAS. Perhaps with something like @ccallable.

ViralBShah avatar Feb 17 '21 18:02 ViralBShah

It only has gemms, so there'd be a lot missing. But perhaps it could motivate more effort to implement linear algebra functionality.

Unfortunately, we couldn't use RecursiveFactorization, since it still depends on BLAS.

chriselrod avatar Feb 17 '21 19:02 chriselrod

One can imagine complicated schemes - where we only own the gemms and forward the rest to openblas. This is possible with LBT.

ViralBShah avatar Feb 17 '21 19:02 ViralBShah

That's great, so we could replace one piece at a time.

chriselrod avatar Feb 17 '21 20:02 chriselrod

We can export an LBT API that allows manually setting function pointers; so if you can get a function pointer to a Julia function, we can have LBT forward to it. But you wouldn't get all the niceness that LBT affords like auto-detecting bitwidth and such. That's probably fine, just know that it would be a real footgun of an API.

So you'd do something like:

ccall((:set_fptr, libblastrampoline), Cvoid, (Cstring, Ptr{Cvoid}), "dgemm_64_", Octavian.dgemm64_ptr)

You would be responsible for all LP64/ILP64 concerns, and for ensuring that your internal implementation of dgemm() doesn't accidentally call a BLAS dgemm anywhere..... as that would cause infinite recursion. ;)

staticfloat avatar Feb 17 '21 20:02 staticfloat

NVBLAS does something like this, for example: https://github.com/staticfloat/libblastrampoline/issues/23

ViralBShah avatar Feb 17 '21 20:02 ViralBShah

I would also be interested in the fallback approach in LBT, since I have been wanting to finally get BLASFEO as a BLAS backend in Julia so I can play with it on ARM more - and one of the things that has slowed me down has always been that it doesn't actually define all the BLAS functions, only the ones that it accelerates, so being able to do a fallback for the others would be great. The "footgun API" would probably work for what I want to do, since I also will have to do detection and choice of the actual dynamic library that is loaded and called into (it uses compile-time options for the architecture and each compiled library only supports one architecture).

imciner2 avatar Feb 18 '21 00:02 imciner2

LBT could use some contributions to make this possible. Currently it processes every single API. We just need to generalize it a bit so that it can load symbols from a complete BLAS (like right now) and then overwrite some from a different list (like for BLASFEO, a future libOctavian, NVBLAS, etc.).

ViralBShah avatar Feb 18 '21 00:02 ViralBShah

If it's an actual shared library, it's probably not so bad; you can do something like:

# Open BLASFEO
blasfeo_handle = dlopen(blasfeo_path, RTLD_LOCAL)

# Configure BLASFEO as ILP64 or whatever
configure_blasfeo(blasfeo_handle)

# Call LBT's `load_blas_funcs()`, with `clear` set to `false` and `verbose` set to `true`
# Note that this is actually a `ccall()`, I'm just pseudo-coding here
load_blas_funcs(blasfeo_path, 0, 1)

So you can load BLASFEO, configure it, then tell LBT to load whatever functions it can from it, overriding the OpenBLAS functions and leaving whatever doesn't exist within BLASFEO still pointing at OpenBLAS. If you set clear to 1, it will clear those out first so that it's a "purely BLASFEO" setup, which will crash if you try to call something that is a NULL pointer because it doesn't exist in BLASFEO.

staticfloat avatar Feb 18 '21 00:02 staticfloat