Benchmarking time with CPU/GPU solvers for comparison
We've added a GPU solver https://github.com/USCqserver/OpenQuantumTools.jl/blob/f0e34ec2f94357d9bc1c2fa65459646a8c5b3857/src/QSolver/closed_system_solvers.jl#L30-L44 which will get integrated better soon. We've shown informally that for n = 10 qubits and tf = 10ns anneal, the GPU version takes around 1 second and the CPU version 8 seconds. We'd like to get more systematic benchmarking/ run data on these to show when you get and improvement and by how much.
The informal timing was done with the following test code: https://github.com/naezzell/accelqat/blob/b617c423daaa4cb0ab2f4c1a4d8f2536fb9f7bb3/cuda/try_gpu_accel.jl#L1-L66 with 2 CPUs and 1 GPU on USC Discovery cluster.
For the GPU accelerated solver we have done a preliminary benchmarking for comparision. It shows advantage when init Hamiltonian is big enough.
The test code can be find here naezzell/accelqat/cuda/scaling_test.jl.
The test result and relevent CPU information can be find here naezzell/accelqat/cuda/scaling_test_result/ (with one NVIDIA Tesla K40 GPU)
