qiskit Some unit tests crash when using symengine

Environment

Qiskit version: 1.1.0rc1 (but also in previous stable versions)
Python version: 3.12.3
Operating system: Arch Linux

What is happening?

Some unit tests fail due to symengine crashes. For example, the following five tests from test_circuit_load_from_qpy.py crash on my system:

test_parameter_expression
test_evolutiongate_param_expr_time
test_parameter_expression_global_phase
test_parameter_vector_element_in_expression
test_symengine_full_path

From the first test, the exact line when the the test crashes is this: https://github.com/Qiskit/qiskit/blob/6ff0eb54f88cdbe98275872b8ec9d65f76948bb4/qiskit/qpy/binary_io/value.py#L58

Which produces a src/tcmalloc.cc:304] Attempt to free invalid pointer 0x6146ecca5b60.

How can we reproduce the issue?

Here is a small self-contained script that can replicate the crash described above.

from qiskit.circuit.parameter import Parameter
from qiskit.circuit.parameterexpression import ParameterExpression


theta = Parameter("theta")
phi = Parameter("phi")
sum_param = theta + phi

sum_param._symbol_expr.__reduce__()
# src/tcmalloc.cc:304] Attempt to free invalid pointer 0x645c8e3eaa30 
# [1]    987079 IOT instruction (core dumped)  python

The following running on the same system works:

import symengine

theta = symengine.Symbol("theta")
phi = symengine.Symbol("phi")
sum_param = theta + phi

sum_param.__reduce__()

What should happen?

Ideally, all unit tests would pass in a clean chroot env on Arch Linux.

Any suggestions?

No response

May 07 '24 15:05 iyanmv

For some reason, if symengine is imported, there is no crash, i.e.,

import symengine
from qiskit.circuit.parameter import Parameter
from qiskit.circuit.parameterexpression import ParameterExpression


theta = Parameter("theta")
phi = Parameter("phi")
sum_param = theta + phi

sum_param._symbol_expr.__reduce__()

May 07 '24 16:05 iyanmv

Thanks for the report. Do you know what version of symengine you're using?

Fwiw, it would be really pretty weird if this is caused by Qiskit and not some allocator problem within Symengine - our use of it is very much inline with their public interface, and we don't go messing around in their internals or anything.

May 07 '24 17:05 jakelishman

I'm using symengine 0.11.2, and Arch Linux builds the library with the following options:

cmake -B build -S $pkgname-$pkgver \
  -DCMAKE_INSTALL_PREFIX=/usr \
  -DBUILD_SHARED_LIBS=ON \
  -DWITH_TCMALLOC=ON \
  -DWITH_PTHREAD=ON \
  -DWITH_SYMENGINE_THREAD_SAFE=ON \
  -DWITH_ARB=ON \
  -DWITH_ECM=ON \
  -DWITH_LLVM=ON \
  -DWITH_MPFR=ON \
  -DWITH_MPC=ON \
  -DWITH_PRIMESIEVE=ON \
  -DWITH_BOOST=ON \
  -DWITH_COTIRE=OFF \
  -DWITH_SYSTEM_CEREAL=ON
cmake --build build

May 07 '24 20:05 iyanmv

If you installed symengine from pip or conda, you'd probably have got their pre-packaged wheels, which is the same situation we're usually expecting. If you're using it from Arch's repos in the system Python installation, then this could very well be a difference: I don't think symengine's published wheels use tcmalloc, and it could be some fault there (or perhaps some questionable behaviour that happens to work with glibc's malloc, but tcmalloc is rejecting).

The difference between pre-importing symengine and not probably isn't indicative of everything being super fixed (since qiskit does just import symengine internally) - it's more likely a spooky-action-at-a-distance memory bug.

If you have the means to try, would you be able to see if the error repeats if you drop the -DWITH_TCMALLOC=ON (or manually set it to off) while you're building the symengine your Python installation is linking against?

May 07 '24 20:05 jakelishman

I was going to do that experiment next, yes. I will rebuild symengine and python-symengine without tcmalloc and see if that solves the issue.

I just installed qiskit quickly in a venv and when I use the symengine from PyPI I also can't replicate the issue.

May 07 '24 21:05 iyanmv

I can confirm those tests don't fail when using symengine without tcmalloc. I will create an issue on Arch Linux.

May 10 '24 08:05 iyanmv

Thanks so much for looking into it in depth! I guess if the issue is somewhere in Arch / tcmalloc / symengine, we can close this issue on Qiskit? I'll close it, but feel free to re-open if there's more to discuss.

May 10 '24 10:05 jakelishman

After solving the crashes due to tcmalloc, I found another error related to symengine (this time a numerical rounding error).

In particular is this test: https://github.com/Qiskit/qiskit/blob/235e581b1f76f29add0399989ed47f47a4e98bb8/test/python/qobj/test_pulse_converter.py#L343

It works running on a virtual env, but if fails using system shared libraries. It's a numerical error happening in this line: https://github.com/Qiskit/qiskit/blob/235e581b1f76f29add0399989ed47f47a4e98bb8/qiskit/circuit/parameterexpression.py#L534

Running on my system real_expr has type symengine.lib.symengine_wrapper.RealDouble and value 3140000.0. However, running float(real_expr) returns 3139999.9999999995. I think this possibility was considered here but it doesn't work as expected: https://github.com/Qiskit/qiskit/blob/235e581b1f76f29add0399989ed47f47a4e98bb8/qiskit/pulse/utils.py#L71-L74

The reason is that decimal has a value 10 so this error is not removed with the rounding. Passing decimals=9 fixes the issue for me.

Note also this relevant comment from NumPy (see Notes):

np.round uses a fast but sometimes inexact algorithm to round floating-point datatypes. For positive decimals it is equivalent to np.true_divide(np.rint(a * 10decimals), 10decimals), which has error due to the inexact representation of decimal fractions in the IEEE floating point standard [1] and errors introduced when scaling by powers of ten. (...) If your goal is to print such values with a fixed number of decimals, it is preferable to use numpy’s float printing routines to limit the number of printed decimals: (...) The float printing routines use an accurate but much more computationally demanding algorithm to compute the number of digits after the decimal point. Alternatively, Python’s builtin round function uses a more accurate but slower algorithm for 64-bit floating point values:

Perhaps it's better to use round(operand, ndigits=decimal) and change the default value of decimal from 10 to 9? What do you think? I can make a PR if you think it's a good idea (I will check first that there are no other side effects after changing that).

May 10 '24 11:05 iyanmv

Sorry, I guess round() is out of the table because of complex numbers.

May 10 '24 12:05 iyanmv

Also, after thinking a bit I don't understand why there should be such a rounding error in the first place. Do you know why is that error expected from that comment in the code and why decimal has a default value of 10 instead of 14 for example (which should be fine with double precision I guess)? I also can't replicate the error using pure symengine code. I can try to compile symengine without MPFR as a wild guess... but I don't understand what is going on.

May 10 '24 12:05 iyanmv

No difference compiling symengine with -DWITH_MPFR=OFF

May 12 '24 12:05 iyanmv

I noticed that the ParameterExpression(f/1000) is multiplied by a float here: https://github.com/Qiskit/qiskit/blob/f20dad0d03281bf3e628a7c32c117f41140365d4/qiskit/qobj/converters/pulse_instruction.py#L780

This causes frequency to have the value ParameterExpression(1000000.0*f) in this particular unit test instead of ParameterExpression(1000000*f) which could have be obtained by multiplying by an integer instead, i.e.,

frequency = self.disassemble_value(instruction.frequency) * 10**9

This is what ultimately leads to the numerical error that could have been avoided and has nothing to do with the double precision of the float type.

May 12 '24 14:05 iyanmv

And here is a small self-contained example replicating the issue with pure symengine:

import symengine

f = symengine.Symbol("f")
expr = f / 1000

expr_wrong = expr * 1e9
expr_good = expr * 10**9

wrong_value = float(expr_wrong.subs("f", 3.14))
good_value = float(expr_good.subs("f", 3.14))

print(wrong_value)
# 3139999.9999999995
print(good_value)
# 3140000.0

May 12 '24 14:05 iyanmv

Sorry for the lack of response here - I reopened the issue, started typing a comment, then got distracted and forgot I hadn't posted.

Thanks so much for tracking this down and providing the fix - that's a very tricksy bug to find, and thanks for filing it on symengine.

May 15 '24 17:05 jakelishman

No problem ;)

May 16 '24 05:05 iyanmv

qiskit qiskit copied to clipboard

Some unit tests crash when using symengine

Environment

What is happening?

How can we reproduce the issue?

What should happen?

Any suggestions?

qiskit
qiskit copied to clipboard