catalyst
                                
                                 catalyst copied to clipboard
                                
                                    catalyst copied to clipboard
                            
                            
                            
                        [MLIR] Enable multi threaded compilation
Context: MLIR supports multi-threaded compilation which was disabled with a Todo notice.
Description of the Change:  By this PR we add an qjit option enabling the MLIR multi-threaded compilation. We also make some precautions and infrastructure updates:
- Verbose dump handlers are protected from the race for the filename counter.
- Multi-threading flag has been added to the top-level qjit API.
- Timing stats are now printed to dedicated diagnostic streams.
Note: measurement results (internal document)
Benefits:
- Faster compilation on programs containing functions
- Capturing timings works from Python
Possible Drawbacks: Multi-threading issues might be uncovered(?)
Related GitHub Issues: closes https://github.com/PennyLaneAI/catalyst/issues/466
[sc-59511]
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 98.10%. Comparing base (
a723cd1) to head (1bbf6e0).
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #655      +/-   ##
==========================================
- Coverage   99.95%   98.10%   -1.85%     
==========================================
  Files          20       69      +49     
  Lines        4444     9556    +5112     
  Branches        0      747     +747     
==========================================
+ Hits         4442     9375    +4933     
- Misses          2      147     +145     
- Partials        0       34      +34     
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hello. You may have forgotten to update the changelog!
Please edit doc/changelog.md on your branch with:
- A one-to-two sentence description of the change. You may include a small working example for new features.
- A link back to this PR.
- Your name (or GitHub username) in the contributors section.
Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?
Did you find any?
Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?
Did you find any?
Nope, everything seems to be fine so far.
Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?
Did you find any?
Nope, everything seems to be fine so far.
One thing we should test this with is the instrumentation. I don't think we have frontend tests for that feature yet though, until #597 is merged.
Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?
Did you find any?
Nope, everything seems to be fine so far.
One thing we should test this with is the instrumentation. I don't think we have frontend tests for that feature yet though, until #597 is merged.
Do you want me to do this or are you going to check?
@dime10 @grwlf what do you guys think about turning it off by default and having it in the release and then turning it on after the release as default to test more thoroughly?
@dime10 @grwlf what do you guys think about turning it off by default and having it in the release and then turning it on after the release as default to test more thoroughly?
Great idea!
Looking into MLIR's documentation:
The structure of the pass manager, and the concept of nesting, is detailed further below. All passes in MLIR derive from OperationPass and adhere to the following restrictions; any noncompliance will lead to problematic behavior in multithreaded and other advanced scenarios
and later below
Must not modify the state of operations other than the operations that are nested under the current operation. This includes adding, modifying or removing other operations from an ancestor/parent block.
Other threads may be operating on these operations simultaneously. As an exception, the attributes of the current operation may be modified freely. This is the only way that the current operation may be modified. (I.e., modifying operands, etc. is not allowed.)
I believe we have not been following this. Specifically in gradients and other passes, we add functions to the module.
I think we should not make this the default yet. But if we are willing to increase testing time, we could add small set of compilation tests every so often to see if we catch any errors.
I'll be running this PR in a bit looking for potential issues (or a reason why they haven't been seen). Thanks!
It looks like one of the reasons why we are not taking advantage of parallelism is because all our passes act on a module scope:
include/Catalyst/Transforms/Passes.td:def CatalystBufferizationPass : Pass<"catalyst-bufferize"> {
include/Catalyst/Transforms/Passes.td:def ArrayListToMemRefPass : Pass<"convert-arraylist-to-memref"> {
include/Catalyst/Transforms/Passes.td:def CatalystConversionPass : Pass<"convert-catalyst-to-llvm"> {
include/Catalyst/Transforms/Passes.td:def ScatterLoweringPass : Pass<"scatter-lowering"> {
include/Catalyst/Transforms/Passes.td:def HloCustomCallLoweringPass : Pass<"hlo-custom-call-lowering"> {
include/Catalyst/Transforms/Passes.td:def QnodeToAsyncLoweringPass : Pass<"qnode-to-async-lowering"> {
include/Catalyst/Transforms/Passes.td:def AddExceptionHandlingPass : Pass<"add-exception-handling"> {
include/Catalyst/Transforms/Passes.td:def GEPInboundsPass : Pass<"gep-inbounds"> {
include/Gradient/Transforms/Passes.td:def GradientBufferizationPass : Pass<"gradient-bufferize"> {
include/Gradient/Transforms/Passes.td:def GradientLoweringPass : Pass<"lower-gradients"> {
include/Gradient/Transforms/Passes.td:def GradientConversionPass : Pass<"convert-gradient-to-llvm"> {
include/Mitigation/Transforms/Passes.td:def MitigationLoweringPass : Pass<"lower-mitigation"> {
include/Quantum/Transforms/Passes.td:def QuantumBufferizationPass : Pass<"quantum-bufferize"> {
include/Quantum/Transforms/Passes.td:def QuantumConversionPass : Pass<"convert-quantum-to-llvm"> {
include/Quantum/Transforms/Passes.td:def EmitCatalystPyInterfacePass : Pass<"emit-catalyst-py-interface"> {
include/Quantum/Transforms/Passes.td:def CopyGlobalMemRefPass : Pass<"cp-global-memref"> {
include/Quantum/Transforms/Passes.td:def AdjointLoweringPass : Pass<"adjoint-lowering"> {
include/Quantum/Transforms/Passes.td:def RemoveChainedSelfInversePass : Pass<"remove-chained-self-inverse"> {
include/Quantum/Transforms/Passes.td:def AnnotateFunctionPass : Pass<"annotate-function"> {
include/Test/Transforms/Passes.td:def TestPass : Pass<"test"> {
I think in order to take advantage, we would need to change these to Pass<"foo", FuncOp> or similar. Let me give it a try.
I am currently running into a weird error :sweat_smile: All test pass except one for printing the IR. When the IR is printed, it gets printed nine times. Even if it was not printed nine times, the test would fail.
Can you reproduce it locally? I've re-triggered a run of the code.
FAILED frontend/test/pytest/test_debug.py::TestPrintStage::test_hlo_lowering_stage - AssertionError: assert 'stablehlo.constant' not in 'module @fun...rn\n  }\n}\n'          
I am currently running into a weird error 😅 All test pass except one for printing the IR. When the IR is printed, it gets printed nine times. Even if it was not printed nine times, the test would fail.
Can you reproduce it locally? I've re-triggered a run of the code.
FAILED frontend/test/pytest/test_debug.py::TestPrintStage::test_hlo_lowering_stage - AssertionError: assert 'stablehlo.constant' not in 'module @fun...rn\n }\n}\n'
Oh, it turns out I broke the PR by my 'simplification' commit. I have reverted it and added comments. The problem happened because the dump handler was triggered by all passes instead of passes which end a pipeline.