qsim icon indicating copy to clipboard operation
qsim copied to clipboard

Benchmarking of Qsim with Cirq

Open karlunho opened this issue 4 years ago • 4 comments

Need to do a benchmark fo qsim with cirq, and publish documentation.

karlunho avatar Jun 09 '20 07:06 karlunho

Would the following help? https://colab.research.google.com/drive/1354IM8hwPU0la4Jv7-b2vwVqrEgasOsq?usp=sharing

This simple benchmark shows that qsimcirq is 5x faster. The benchmark is based on https://github.com/quantumlib/qsim/blob/e4dadfe84e26b2b0d8165501f2cb56cddd9f172f/qsimcirq_tests/qsimcirq_test.py#L44

alexandrupaler avatar Jun 09 '20 07:06 alexandrupaler

We have a simple comparison of qsim vs. the Cirq simulator in the tutorial doc, but I imagine this issue is looking towards a more complete benchmark.

95-martin-orion avatar Oct 05 '20 15:10 95-martin-orion

Cross ref https://github.com/quantumlib/Cirq/issues/1124 we could possibly implement this under the same suite of benchmarks.

balopat avatar Dec 03 '20 22:12 balopat

Hello,

I compared qsim with the standard Cirq simulator on 2 AMD EPYC 7502 CPUs (64 cores) with 512 RAM as well as both with the best GPU Cirq simulator (QuLacs-GPU), I was able to find, on a v100 GPU among others (see plot 1). For this comparison, I used the runtime of the Quantum Fourier Transformation (QFT) dependent on the simulated amount of qubits. One reason for choosing QFT as a simulator benchmarking algorithm was that comparable benchmark data exists (e.g. https://arxiv.org/abs/2009.01845 comparing their custom simulator with the standard Cirq python simulator). Another reason was, that QFT with its very sparse 2-qubit controlled Z power gates and otherwise only 1-qubit gates is a rather ‘worst case scenario’ for qsim and its gate fusion. Hence, it should allow to obtain a reasonable lower bound on the performance improvement when using qsim compared to the Cirq python simulator.

Still, in plot 1 one can see that qsim with fused gates and 64 threads starts outperforming any other considered Cirq simulation method for >21 qubits. At 29 qubits when the GPUs run out of RAM qsim at the 2 AMD EPYC 7502 CPUs is about a factor of 2 faster than QuLacs-GPU on a v100 GPU; or respectively they are comparable for qsim on one AMD EPYC 7502 CPU. In the other plot is shown how qsim uses the available resources as well as the effect of fused gates, which reduce the total data volume by a factor of approx. 2.5. Similarly, the runtime is reduced by a factor of 2 with fused gates. The maximal memory bandwidth for one AMD EPYC 7502 CPU is about 200GB/s, meaning that about half the maximal bandwidth is used. Respectively, the maximal floating-point operations per second are about 1,100 GFLOP/s for one CPU which indicates that qsim with fused gates uses nearly 70 % of that.

plot1.pdf plot2.pdf

TimoEckstein avatar Mar 25 '21 12:03 TimoEckstein