cuda-quantum icon indicating copy to clipboard operation
cuda-quantum copied to clipboard

[RFC] Unitary Synthesis

Open khalatepradnya opened this issue 1 year ago • 7 comments

Describe the feature

Problem

Given a user provided arbitrary quantum unitary, synthesize it into a sequence of quantum gates.

Expectations

  • User provides an arbitrary unitary matrix as a custom quantum operation.
  • The custom operation can be used as a regular CUDA-Q supported quantum operation.
    • Q: Broadcast (same operation on multiple qubits): Out of scope
  • The allowed set of quantum gates for synthesis depends on the backend target.
    • Q: Allow user to specify set of allowed gates: Out of scope
  • CUDA-Q throws error if a unitary cannot be synthesized (reasonably).
    • 'reasonably' to account for time limit (timeout), gate count limit (upper threshold), and how close the synthesized "circuit" is to the input unitary (tolerance)
  • Parameterized custom operations will be covered in a follow-up RFC.

User API

  • Python
import cudaq 

cudaq.register_operation("custom_h", 1. / np.sqrt(2.) *  np.array([[1, 1], [1, -1]])) 
cudaq.register_operation("custom_x", np.array([[0, 1], [1, 0]])) 

@cudaq.kernel 
def bell(): 
  qubits = cudaq.qvector(2) 
  custom_h(qubits[0]) 
  custom_x.ctrl(qubits[0], qubits[1]) 

counts = cudaq.sample(bell) 
counts.dump()

  • C++
// Macro to specify the custom unitary operation
cudaq_register_operation(custom_h, 1, 0,
                         (std::vector<std::vector<std::complex<double>>>{
                             {M_SQRT1_2, M_SQRT1_2}, {M_SQRT1_2, -M_SQRT1_2}}));
cudaq_register_operation(
    custom_x, 1, 0, (std::vector<std::vector<std::complex<double>>>{{0, 1}, {1, 0}}));

void custom_operation() __qpu__ {
  cudaq::qvector qubits(2);
  custom_h(qubits[0]);
  custom_x.ctrl(qubits[0], qubits[1]);
}

int main() {
  auto result = cudaq::sample(custom_operation);
  std::cout << result.most_probable() << '\n';
  return 0;
}
  • The user must provide valid unitary matrix (CUDA-Q will not check / enforce this requirement)
  • Ordering: The user provided matrix must be in row-major format
  • Endianness: The user provided matrix is interpreted as Big-endian (often followed by Physics textbooks).

Constraints

  • Size of unitary matrix: limit to 8 qubits, (2^8 = 256), 256 x 256
  • The custom operation must be defined outside of a quantum kernel. (for e.g. call to register_operation cannot be inside a function decorated with @cudaq.kernel)
  • The tolerance for the synthesized circuit and the gate count limit will be default values determined by CUDA-Q
  • The custom operation definition is restricted to qubit (cudaq::qudit<2>).

Workflow

image
  • In simulation, no synthesis will happen.
  • Compiler will automatically synthesize the matrix when targeting hardware.
  • Explicit synthesis mechanism (API or command-line argument) - Out of scope for the first iteration
  • NVQC target behaves same as when running locally

Work items / TO-DOs

  • [x] Support in simulation for Python -
    • [x] Kernel mode
    • [x] Builder mode
    • [x] State vector simulators
    • [x] Tensornet simulators
  • [x] Support in simulation for C++
    • [x] Library mode
    • [x] MLIR mode
  • [x] Add generic synthesis for emulation
  • [x] Error handling: Gracefully handle user errors, feature constraints and runtime errors
  • [ ] Support synthesis per hardware backend
  • [ ] ~~Comprehensive documentation and useful example(s)~~: Covered in issue #2002

khalatepradnya avatar Apr 04 '24 16:04 khalatepradnya

Specifically, I'm not entirely sure what the following code's intended semantics is.

cudaq_register_op("custom_h",
                  {{M_SQRT1_2, M_SQRT1_2}, {M_SQRT1_2, -M_SQRT1_2}});
cudaq_register_op("custom_x", {{0, 1}, {1, 0}});

These are calls? Macros? What exactly is being registered? And with what?

These aren't marked as __qpu__ code so will be entirely opaque to the compiler at first blush. Hence, the compiler cannot generate quake code for them.

schweitzpgi avatar May 09 '24 18:05 schweitzpgi

Second order question: it may be possible for the compiler to take a constant matrix here and generate a gate list (approximation) from those values. Or perhaps this should be generated entirely in the control hardware at QIR time? And what about the synthesis case? If the compiler is going to generate the gate list, it stands to reason that it will need to do so at synthesis-time. And that affects the IR, which would need to support dynamic matrix specifications that can be instantiated by the synthesizer.

schweitzpgi avatar May 09 '24 18:05 schweitzpgi

These are calls? Macros? What exactly is being registered? And with what?

Macros. Updated the code snippet in description.

khalatepradnya avatar May 15 '24 00:05 khalatepradnya

Will this PR provide unitary decomposition like what qiskit's transpile does? https://github.com/NVIDIA/cuda-quantum/pull/1781

ACE07-Sev avatar Jun 16 '24 12:06 ACE07-Sev

Will this PR provide unitary decomposition like what qiskit's transpile does? #1781

Conceptually, yes. However, the synthesis mechanism and target gateset will be implicit in this iteration.

khalatepradnya avatar Jul 15 '24 16:07 khalatepradnya

Question. I am trying to compare the depth of cuda-quantum's implementation of QSD (I assume it's QSD) vs Qiskit's implementation. May I ask how I can see the depth of the circuit in terms of U3 and CX gates?

ACE07-Sev avatar Sep 26 '24 17:09 ACE07-Sev

Question. I am trying to compare the depth of cuda-quantum's implementation of QSD (I assume it's QSD) vs Qiskit's implementation. May I ask how I can see the depth of the circuit in terms of U3 and CX gates?

Thank you for the question. This feature is not yet available in CUDA-Q. I will update this issue when it becomes available.

khalatepradnya avatar Sep 26 '24 18:09 khalatepradnya