qsim
qsim copied to clipboard
Benchmarking of Qsim with Cirq
Need to do a benchmark fo qsim with cirq, and publish documentation.
Would the following help? https://colab.research.google.com/drive/1354IM8hwPU0la4Jv7-b2vwVqrEgasOsq?usp=sharing
This simple benchmark shows that qsimcirq is 5x faster. The benchmark is based on https://github.com/quantumlib/qsim/blob/e4dadfe84e26b2b0d8165501f2cb56cddd9f172f/qsimcirq_tests/qsimcirq_test.py#L44
We have a simple comparison of qsim vs. the Cirq simulator in the tutorial doc, but I imagine this issue is looking towards a more complete benchmark.
Cross ref https://github.com/quantumlib/Cirq/issues/1124 we could possibly implement this under the same suite of benchmarks.
Hello,
I compared qsim with the standard Cirq simulator on 2 AMD EPYC 7502 CPUs (64 cores) with 512 RAM as well as both with the best GPU Cirq simulator (QuLacs-GPU), I was able to find, on a v100 GPU among others (see plot 1). For this comparison, I used the runtime of the Quantum Fourier Transformation (QFT) dependent on the simulated amount of qubits. One reason for choosing QFT as a simulator benchmarking algorithm was that comparable benchmark data exists (e.g. https://arxiv.org/abs/2009.01845 comparing their custom simulator with the standard Cirq python simulator). Another reason was, that QFT with its very sparse 2-qubit controlled Z power gates and otherwise only 1-qubit gates is a rather ‘worst case scenario’ for qsim and its gate fusion. Hence, it should allow to obtain a reasonable lower bound on the performance improvement when using qsim compared to the Cirq python simulator.
Still, in plot 1 one can see that qsim with fused gates and 64 threads starts outperforming any other considered Cirq simulation method for >21 qubits. At 29 qubits when the GPUs run out of RAM qsim at the 2 AMD EPYC 7502 CPUs is about a factor of 2 faster than QuLacs-GPU on a v100 GPU; or respectively they are comparable for qsim on one AMD EPYC 7502 CPU. In the other plot is shown how qsim uses the available resources as well as the effect of fused gates, which reduce the total data volume by a factor of approx. 2.5. Similarly, the runtime is reduced by a factor of 2 with fused gates. The maximal memory bandwidth for one AMD EPYC 7502 CPU is about 200GB/s, meaning that about half the maximal bandwidth is used. Respectively, the maximal floating-point operations per second are about 1,100 GFLOP/s for one CPU which indicates that qsim with fused gates uses nearly 70 % of that.