LoopVectorization.jl
LoopVectorization.jl copied to clipboard
tracking Special Functions' support
This is a follow-up of #232. Right now I would be interested in LV's support of the whole erf family (erfc and erfcx in particular), but since we are at it maybe it is useful to track the whole special functions' family:
- [ ] gamma(z) gamma function \Gamma(z)Γ(z)
- [ ] loggamma(x) accurate log(gamma(x)) for large x
- [ ] logabsgamma(x) accurate log(abs(gamma(x))) for large x
- [ ] logfactorial(x) accurate log(factorial(x)) for large x; same as loggamma(x+1) for x > 1, zero otherwise
- [ ] digamma(x) digamma function (i.e. the derivative of loggamma at x)
- [ ] invdigamma(x) invdigamma function (i.e. inverse of digamma function at x using fixed-point iteration algorithm)
- [ ] trigamma(x) trigamma function (i.e the logarithmic second derivative of gamma at x)
- [ ] polygamma(m,x) polygamma function (i.e the (m+1)-th derivative of the loggamma function at x)
- [ ] gamma(a,z) upper incomplete gamma function \Gamma(a,z)Γ(a,z)
- [ ] loggamma(a,z) accurate log(gamma(a,x)) for large arguments
- [ ] gamma_inc(a,x,IND) incomplete gamma function ratio P(a,x) and Q(a,x)
- [ ] beta_inc(a,b,x,y) incomplete beta function ratio Ix(a,b) and Iy(a,b)
- [ ] gamma_inc_inv(a,p,q) inverse of incomplete gamma function ratio P(a,x) and Q(a,x)
- [ ] beta(x,y) beta function at x,y
- [ ] logbeta(x,y) accurate log(beta(x,y)) for large x or y
- [ ] logabsbeta(x,y) accurate log(abs(beta(x,y))) for large x or y
- [ ] logabsbinomial(x,y) accurate log(abs(binomial(n,k))) for large n and k near n/2
- [ ] expint(ν, z) exponential integral
- [ ] expinti(x) exponential integral
- [ ] expintx(x) scaled exponential integral
- [ ] sinint(x) sine integral
- [ ] cosint(x) cosine integral
- [x] erf(x) : ref. #232
- [ ] erf(x,y)
- [ ] erfc(x) complementary error function,
- [ ] erfcinv(x) inverse function to erfc()
- [ ] erfcx(x) scaled complementary error function
- [ ] logerfc(x) log of the complementary error function
- [ ] logerfcx(x) log of the scaled complementary error function
- [ ] erfi(x) imaginary error function defined as -i \operatorname{erf}(ix)−ierf(ix)
- [ ] erfinv(x) inverse function to erf()
- [ ] dawson(x) scaled imaginary error function, a.k.a. Dawson function,
- [ ] airyai(z) Airy Ai function at z
- [ ] airyaiprime(z) derivative of the Airy Ai function at z
- [ ] airybi(z) Airy Bi function at z
- [ ] airybiprime(z) derivative of the Airy Bi function at z
- [ ] airyaix(z), airyaiprimex(z), airybix(z), airybiprimex(z) scaled Airy Ai function and kth derivatives at z
- [ ] besselj(nu,z) Bessel function of the first kind of order nu at z
- [ ] besselj0(z) besselj(0,z)
- [ ] besselj1(z) besselj(1,z)
- [ ] besseljx(nu,z) scaled Bessel function of the first kind of order nu at z
- [ ] sphericalbesselj(nu,z) Spherical Bessel function of the first kind of order nu at z
- [ ] bessely(nu,z) Bessel function of the second kind of order nu at z
- [ ] bessely0(z) bessely(0,z)
- [ ] bessely1(z) bessely(1,z)
- [ ] besselyx(nu,z) scaled Bessel function of the second kind of order nu at z
- [ ] sphericalbessely(nu,z) Spherical Bessel function of the second kind of order nu at z
- [ ] besselh(nu,k,z) Bessel function of the third kind (a.k.a. Hankel function) of order nu at z; k must be either 1 or 2
- [ ] hankelh1(nu,z) besselh(nu, 1, z)
- [ ] hankelh1x(nu,z) scaled besselh(nu, 1, z)
- [ ] hankelh2(nu,z) besselh(nu, 2, z)
- [ ] hankelh2x(nu,z) scaled besselh(nu, 2, z)
- [ ] besseli(nu,z) modified Bessel function of the first kind of order nu at z
- [ ] besselix(nu,z) scaled modified Bessel function of the first kind of order nu at z
- [ ] besselk(nu,z) modified Bessel function of the second kind of order nu at z
- [ ] besselkx(nu,z) scaled modified Bessel function of the second kind of order nu at z
- [ ] jinc(x) scaled Bessel function of the first kind divided by x. A.k.a. sombrero or besinc
- [ ] ellipk(m) complete elliptic integral of 1st kind K(m)K(m)
- [ ] ellipe(m) complete elliptic integral of 2nd kind E(m)E(m)
- [ ] eta(x) Dirichlet eta function at x
- [ ] zeta(x) Riemann zeta function at x
- [ ] binomial(n, p)
VectorizationBase.jl may be a better place for this issue, but this is fine too.
erf only gets partial credit at the moment.
If someone wants to work on it, some of these can probably be translated fairly directly from SpecialFunctions.jl to handle VectorizationBase.AbstractSIMD arguments.
Some may require functions like SLEEFPirates.cot, which could be moved to VectorizationBase.
However it isn't always possible, e.g. when implementations are broken up to behave differently depending on range, e.g. Taylor series for small values vs recurrences for large ones.
Also of interest: https://github.com/JuliaMath/openspecfun/blob/master/Faddeeva/Faddeeva.cc SpecialFunctions doesn't have Julia implementations for all of these yet.
I'd be happy to answer any questions that I can if you (or anyone else) wants to take a stab at it.
If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?
If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?
They'd also need to replace ifs with IfElse.ifelse.
Loops such as in digamma are trickier. Long term, I think it'd be cool to write something able to compile these functions automatically to be SIMD, in a manner similar to ISPC.
VectorizationBase has unfortunately gotten a bit heavy as a dependency.
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.561828 seconds (1.97 M allocations: 109.844 MiB, 1.22% gc time, 4.59% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.520969 seconds (1.97 M allocations: 109.829 MiB, 1.33% gc time, 4.80% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.575595 seconds (1.97 M allocations: 109.829 MiB, 1.22% gc time, 4.35% compilation time)
on another computer
0.872136 seconds (2.06 M allocations: 117.637 MiB, 1.62% gc time, 4.06% compilation time)
0.870963 seconds (2.06 M allocations: 117.639 MiB, 1.62% gc time, 3.91% compilation time)
0.899115 seconds (2.06 M allocations: 117.637 MiB, 1.61% gc time, 7.04% compilation time)
I'll have to take a look at how much this can be improved.
I could split off at least some of the hardware-related parts, but llvmcall call code would like to depend on/use at least the cpu_feature parts.
If possible, it would be helpful to add a more specific warning when one uses a special function. I came here after spending an hour before realizing that what makes me get "LoopVectorization.check_args on your inputs failed" is using logfactorial.