numexpr
numexpr copied to clipboard
expm1 low accuracy for complex argument
expm1(1j*x)
has the same (low) accuracy for small complex arguments as exp(1j*x) - 1
. For real arguments, the accuracy is as expected and on par with numpy:
Code to reproduce:
import matplotlib.pyplot as plt
import numpy as np
import numexpr as ne
x = np.geomspace(1e-18, 1e-6, 101)
numpy = {}
numpy_naive = {}
numexpr = {}
numexpr['float'] = ne.evaluate('expm1(x)')
numpy['float'] = np.expm1(x)
numpy_naive['float'] = np.exp(x) - 1
numexpr['complex'] = ne.evaluate('expm1(1j*x)')
numpy['complex'] = np.expm1(1j*x)
numpy_naive['complex'] = np.exp(1j*x) - 1
fig, ax = plt.subplots(2, 2, sharex=True, constrained_layout=True)
ax[0, 0].loglog(x, numpy['float'].real, label='numpy')
ax[0, 0].loglog(x, numpy_naive['float'].real, '-.', label='numpy naive')
ax[0, 0].loglog(x, numexpr['float'].real, '--', label='numexpr')
ax[1, 0].loglog(x, abs((numpy['float'] - numexpr['float']).real), label='|np - ne| (expm1)')
ax[0, 0].legend()
ax[1, 0].legend()
ax[0, 0].grid(True)
ax[1, 0].grid(True)
ax[0, 1].loglog(x, -numpy['complex'].real, label='numpy')
ax[0, 1].loglog(x, -numpy_naive['complex'].real, '-.', label='numpy naive')
ax[0, 1].loglog(x, -numexpr['complex'].real, '--', label='numexpr')
ax[1, 1].loglog(x, abs((numpy['complex'] - numexpr['complex']).real), label='|np - ne| (expm1)')
ax[0, 1].legend()
ax[1, 1].legend()
ax[0, 1].grid(True)
ax[1, 1].grid(True)
ax[0, 0].set_title('expm1(x)')
ax[0, 1].set_title('-real(expm1(1j*x))')
ax[1, 0].set_yscale('linear')
ax[1, 0].set_xlabel('x')
ax[1, 1].set_xlabel('x')
NumExpr appears to have been intentionally implemented with fast but low-precision complex functions. If you have an interest, you can browse them here:
https://github.com/pydata/numexpr/blob/master/numexpr/complex_functions.hpp
If you want high-precision complex algebra in NumExpr the easiest fix would be to compile with Intel MKL support (or use the conda
distribution), as with MKL you have the option to set the precision for complex operations. e.g.
import numexpr as ne
ne.set_vml_accuracy_mode('high')
Vanilla NumExpr is roughly equivalent in precision to the MKL 'fast' mode.
I see, thanks. I think this is quite misleading since the whole point of a dedicated expm1
is improved accuracy for small arguments due to special handling of the problematic computations (cos(0) - 1
in the imaginary case). To my understanding, increasing the floating point precision will only move the crossover point where the accuracy breaks down. As it is now, expm1
has the same speed and accuracy as exp - 1
for complex input, which makes it kind of pointless.
Taking a look at how expm1 is implemented for complex types in NumPy makes me think this should be easy to do here as well:
static void
nc_expm1@c@(@ctype@ *x, @ctype@ *r)
{
@ftype@ a = npy_sin@c@(x->imag / 2);
r->real = npy_expm1@c@(x->real) * npy_cos@c@(x->imag) - 2 * a * a;
r->imag = npy_exp@c@(x->real) * npy_sin@c@(x->imag);
return;
}
I'd be happy to open a PR if you agree.
Sure a PR would be fine by me.
Just getting around to look at this now. Found one unfortunate additional issue in that, for MKL, it isn't using a 'nice' implementation in interpreter.cpp
:
#ifdef USE_VML
/* complex expm1 not available in VML */
static void vzExpm1(MKL_INT n, const MKL_Complex16* x1, MKL_Complex16* dest)
{
MKL_INT j;
vzExp(n, x1, dest);
for (j=0; j<n; j++) {
dest[j].real -= 1.0;
};
};
vzExp
is the MKL exponential function, but there's no corresponding vzExpm1
function. I'm not sure why, I'll have to write some test code here.
@thangleiter so I have not been able to replicate your problem. I made a new branch and pushed changes to it so I can test both functions simultaneously. Here's my test script:
https://github.com/pydata/numexpr/blob/issue418/issues/issue418.py
As you can see I simultaneously have both versions of the function implemented. Would you look at it any see if I am doing anything wrong in my evaluation?
When you found problems with NumExpr in the initial issue report, were you using a NumExpr compiled with Intel MKL support? Either from conda
or Gohkle's repo?
@thangleiter also which compiler are you using? We can check with Godbolt if there's some compiler optimization doing something strange.
@thangleiter so I have not been able to replicate your problem. I made a new branch and pushed changes to it so I can test both functions simultaneously. Here's my test script:
https://github.com/pydata/numexpr/blob/issue418/issues/issue418.py
As you can see I simultaneously have both versions of the function implemented. Would you look at it any see if I am doing anything wrong in my evaluation?
The implementation of expm1x
looks good to me. However, I am noticing some really strange behavior when testing with both expm1
and expm1x
. Somehow, evaluate('expm1x(z)')
gives different results depending on the implementation of nc_expm1
. I.e., if the latter is implemented as
static void
nc_expm1(npy_cdouble *x, npy_cdouble *r)
{
double a = exp(x->real);
r->real = a*cos(x->imag) - 1.0;
r->imag = a*sin(x->imag);
return;
}
the real part of expm1x
is inaccurate, whereas if it is implemented as
static void
nc_expm1(npy_cdouble *x, npy_cdouble *r)
{
double a = sin(x->imag / 2);
double b = exp(x->real);
r->real = expm1(x->real) * cos(x->imag) - 2 * a * a;
r->imag = b * sin(x->imag);
return;
}
everything is accurate compared to np.expm1
. So it looks like evaluate('expm1x(z)')
actually uses nc_expm1
.
When you found problems with NumExpr in the initial issue report, were you using a NumExpr compiled with Intel MKL support? Either from
conda
or Gohkle's repo?
I am using NumExpr from conda
but don't use MKL:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Numexpr version: 2.8.4dev1
NumPy version: 1.23.1
Python version: 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
Platform: win32-AMD64-10.0.22000
CPU vendor: AuthenticAMD
CPU model: AMD Ryzen 7 PRO 4750U with Radeon Graphics
CPU clock speed: 1697 MHz
VML available? False
Number of threads used by default: 8 (out of 16 detected cores)
Maximum number of threads: 64
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@thangleiter also which compiler are you using? We can check with Godbolt if there's some compiler optimization doing something strange.
I tested this on both MSVC143
and g++
.
Ok, sorry for the long delay but frustrated with the obtuse implementation of NumExpr. I eventually gave up and just overwrote the complex exp
function to do a comparison. Your function is 25 % slower, but it is indeed significantly more accurate for the real component over a wide range of values. Therefore I have merged your changes into master. I'll prepare a new release immediately since Python 3.11 is out.