qiskit-experiments icon indicating copy to clipboard operation
qiskit-experiments copied to clipboard

Overhaul Qiskit Experiments (RFC0014)

Open nkanazawa1989 opened this issue 1 year ago • 12 comments

Epic issue to manage implementation of RFC0014

Migration plan:

  • Rework of composite analysis (assigned to @Musa-Sina-Ertugrul )
  • Implementation of circuit executor (not active)
  • Implementation of analysis executor (not active)
  • Implementation of executor and cleanup of experiment data (not active)
  • JAX-based curve fitter (assigned to @to24toro)
  • Lazy figure generation (https://github.com/Qiskit-Extensions/qiskit-experiments/pull/1240)

Optional:

  • Better syntax to restart experiment workflow
  • Circuit metadata validation with analysis class

nkanazawa1989 avatar Sep 12 '23 00:09 nkanazawa1989

can I use inliner module to prevent a huge thread stack or should I use processes

Musa-Sina-Ertugrul avatar Sep 13 '23 11:09 Musa-Sina-Ertugrul

Also, Can you assign the first job to me

Musa-Sina-Ertugrul avatar Sep 13 '23 11:09 Musa-Sina-Ertugrul

I'm not sure if inliner module helps performance improvement in our case. Indeed it optimizes bytecode and reduces the function call overhead, but we are targeting 1000 parallel post-processing tasks for efficient experiment execution on 1000 qubit device. I think sub-processing is necessary technology to avoid Python GIL, but still open to another suggestion (e.g. implementing everything with Rust).

Also I'm bit hesitate to add inliner to dependency because likely the software doesn't have any public release.

nkanazawa1989 avatar Sep 14 '23 13:09 nkanazawa1989

Can we use pointers in Python to prevent object copying for different processes if this works we don't need ray or shared memory

Musa-Sina-Ertugrul avatar Sep 14 '23 13:09 Musa-Sina-Ertugrul

I am trying to use tox but I got bunch of errors

File "C:\Users\msert\AppData\Local\Temp\pip-build-env-25wrzqy8\overlay\lib\python3.10\site-packages\setuptools\_distutils\msvc9compiler.py", line 295, in <module>
          raise DistutilsPlatformError("VC %0.1f is not supported by this module" % VERSION)
      distutils.errors.DistutilsPlatformError: VC 6.0 is not supported by this module

Musa-Sina-Ertugrul avatar Sep 16 '23 12:09 Musa-Sina-Ertugrul

Pointers could be an option. However I don't think tight coupling between analysis and experiment data is the way to go. Namely, current BaseAnalysis.run takes the data from experiment data, perform analysis, create analysis results, and mutates (populates) the original experiment data. I think the last operation should be a part of executor. If we delegate mutation part to executor, analysis doesn't need to mutate experiment data, and it just needs to return analysis result objects to the caller. This means we don't need shared memory in the first place. Analysis just need to return result objects to the main process, and the executor running on the main process must update experiment data. But reducing object copying overhead could lead to performance gain. In this sense, use of pointer is worth considering as long as it reliably works on all platforms that Qiskit must support.

Regarding the tox issue, I think this is a problem of your IDE (VS code?). I'm only using MacOS and not eligible to support Windows issue. Probably @mtreinish can help?

nkanazawa1989 avatar Sep 19 '23 04:09 nkanazawa1989

The first task of Rework of composite analysis is mainly about bootstrapping of component experiment data. Thus multiprocessing is not necessary (but of course you can). The job results contain metadata for sub experiments, and we should be able to create component experiment data without running composite analysis.

nkanazawa1989 avatar Sep 19 '23 05:09 nkanazawa1989

Check for pointers

from multiprocessing import Process

def multi(obj):
    print(id(obj[0]))

if __name__ == '__main__':
    obj = (1,2,3)
    p1=Process(target=multi,args=(obj,))
    p2=Process(target=multi,args=(obj,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

140735097377576
140735097377576

Check for tuple object

from multiprocessing import Process

def multi(obj):
    print(id(obj))

if __name__ == '__main__':
    obj = (1,2,3)
    p1=Process(target=multi,args=(obj,))
    p2=Process(target=multi,args=(obj,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()
2563017131712
2466648803008

I tested it on Windows 11 but, I think it is a Python trick so it has to work on other os

Musa-Sina-Ertugrul avatar Sep 19 '23 12:09 Musa-Sina-Ertugrul

I ran tox on my Linux laptop, and it succeded. I think it is about os incompatibility or something like that

Musa-Sina-Ertugrul avatar Sep 19 '23 12:09 Musa-Sina-Ertugrul

print("0: " + str(getsizeof(table_format)))
print("1: " + str(getsizeof(tuple(table_format.values()))) + "len: " + str(len(table_format.items())))
print("2: "+ str(getsizeof((tuple(table_format.values()),))) + "len: " + str(len(((table_format.items()),))))
0: 40
1: 120 len: 10
2: 48 len: 1

There is size compression. Dict is better than tuple, but using dict or tuple and getting elements from them may be ugly. Will we use them to prevent copying overhead ?

Also, I found this line:

expdata.add_analysis_results(**table_format)

If we call this on the executor first issue will be solved but, I did not change the code, because I think test cases have been written according to the current implementation. When I change other components and add return statements, I will delete that line.

Musa-Sina-Ertugrul avatar Sep 19 '23 14:09 Musa-Sina-Ertugrul

I checked important thing again:

print("sizeof: " + str(getsizeof(Animated.data)))
sizeof: 1584

Animated.data is from my game project that uploads game assets at compile time. We can't use dict, because the size of dict changes over time ( adding variables ). Therefore, we have to use tuple or dict in dict.

Musa-Sina-Ertugrul avatar Sep 19 '23 21:09 Musa-Sina-Ertugrul

Regarding

Implementation of circuit executor (not active)

there was feedback from a user that for large batch experiments, transpilation can be slow. We may want to factor this into the execution design. For users with high priority queue access, there could be a benefit to breaking up the transpilation based on the max_circuits option and transpiling the first set of circuits and submitting the first job before transpiling the second set of circuits. Currently, all the circuits are transpiled, then all are submitted into a set of jobs at once.

wshanks avatar Feb 19 '24 18:02 wshanks