catalyst [MLIR] Enable multi threaded compilation

Context: MLIR supports multi-threaded compilation which was disabled with a Todo notice.

Description of the Change: By this PR we add an qjit option enabling the MLIR multi-threaded compilation. We also make some precautions and infrastructure updates:

Verbose dump handlers are protected from the race for the filename counter.
Multi-threading flag has been added to the top-level qjit API.
Timing stats are now printed to dedicated diagnostic streams.

Note: measurement results (internal document)

Benefits:

Faster compilation on programs containing functions
Capturing timings works from Python

Possible Drawbacks: Multi-threading issues might be uncovered(?)

Related GitHub Issues: closes https://github.com/PennyLaneAI/catalyst/issues/466

[sc-59511]

Apr 10 '24 09:04 sergei-mironov

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.10%. Comparing base (a723cd1) to head (1bbf6e0).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #655      +/-   ##
==========================================
- Coverage   99.95%   98.10%   -1.85%     
==========================================
  Files          20       69      +49     
  Lines        4444     9556    +5112     
  Branches        0      747     +747     
==========================================
+ Hits         4442     9375    +4933     
- Misses          2      147     +145     
- Partials        0       34      +34

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Apr 10 '24 10:04 codecov[bot]

Hello. You may have forgotten to update the changelog! Please edit doc/changelog.md on your branch with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

Apr 10 '24 13:04 github-actions[bot]

Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?

Did you find any?

Apr 10 '24 21:04 dime10

Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?

Did you find any?

Nope, everything seems to be fine so far.

Apr 16 '24 10:04 sergei-mironov

Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?

Did you find any?

Nope, everything seems to be fine so far.

One thing we should test this with is the instrumentation. I don't think we have frontend tests for that feature yet though, until #597 is merged.

Apr 16 '24 14:04 dime10

Possible Drawbacks: Some multi-threading MLIR issues might be uncovered?

Did you find any?

Nope, everything seems to be fine so far.

One thing we should test this with is the instrumentation. I don't think we have frontend tests for that feature yet though, until #597 is merged.

Do you want me to do this or are you going to check?

Apr 17 '24 08:04 sergei-mironov

@dime10 @grwlf what do you guys think about turning it off by default and having it in the release and then turning it on after the release as default to test more thoroughly?

Apr 24 '24 13:04 erick-xanadu

@dime10 @grwlf what do you guys think about turning it off by default and having it in the release and then turning it on after the release as default to test more thoroughly?

Great idea!

Apr 24 '24 15:04 dime10

what do you guys think about turning it off

@erick-xanadu I believe it is off by default currently

Apr 29 '24 13:04 sergei-mironov

Looking into MLIR's documentation:

The structure of the pass manager, and the concept of nesting, is detailed further below. All passes in MLIR derive from OperationPass and adhere to the following restrictions; any noncompliance will lead to problematic behavior in multithreaded and other advanced scenarios

and later below

Must not modify the state of operations other than the operations that are nested under the current operation. This includes adding, modifying or removing other operations from an ancestor/parent block.

Other threads may be operating on these operations simultaneously. As an exception, the attributes of the current operation may be modified freely. This is the only way that the current operation may be modified. (I.e., modifying operands, etc. is not allowed.)

I believe we have not been following this. Specifically in gradients and other passes, we add functions to the module.

I think we should not make this the default yet. But if we are willing to increase testing time, we could add small set of compilation tests every so often to see if we catch any errors.

I'll be running this PR in a bit looking for potential issues (or a reason why they haven't been seen). Thanks!

May 08 '24 13:05 erick-xanadu

It looks like one of the reasons why we are not taking advantage of parallelism is because all our passes act on a module scope:

include/Catalyst/Transforms/Passes.td:def CatalystBufferizationPass : Pass<"catalyst-bufferize"> {
include/Catalyst/Transforms/Passes.td:def ArrayListToMemRefPass : Pass<"convert-arraylist-to-memref"> {
include/Catalyst/Transforms/Passes.td:def CatalystConversionPass : Pass<"convert-catalyst-to-llvm"> {
include/Catalyst/Transforms/Passes.td:def ScatterLoweringPass : Pass<"scatter-lowering"> {
include/Catalyst/Transforms/Passes.td:def HloCustomCallLoweringPass : Pass<"hlo-custom-call-lowering"> {
include/Catalyst/Transforms/Passes.td:def QnodeToAsyncLoweringPass : Pass<"qnode-to-async-lowering"> {
include/Catalyst/Transforms/Passes.td:def AddExceptionHandlingPass : Pass<"add-exception-handling"> {
include/Catalyst/Transforms/Passes.td:def GEPInboundsPass : Pass<"gep-inbounds"> {
include/Gradient/Transforms/Passes.td:def GradientBufferizationPass : Pass<"gradient-bufferize"> {
include/Gradient/Transforms/Passes.td:def GradientLoweringPass : Pass<"lower-gradients"> {
include/Gradient/Transforms/Passes.td:def GradientConversionPass : Pass<"convert-gradient-to-llvm"> {
include/Mitigation/Transforms/Passes.td:def MitigationLoweringPass : Pass<"lower-mitigation"> {
include/Quantum/Transforms/Passes.td:def QuantumBufferizationPass : Pass<"quantum-bufferize"> {
include/Quantum/Transforms/Passes.td:def QuantumConversionPass : Pass<"convert-quantum-to-llvm"> {
include/Quantum/Transforms/Passes.td:def EmitCatalystPyInterfacePass : Pass<"emit-catalyst-py-interface"> {
include/Quantum/Transforms/Passes.td:def CopyGlobalMemRefPass : Pass<"cp-global-memref"> {
include/Quantum/Transforms/Passes.td:def AdjointLoweringPass : Pass<"adjoint-lowering"> {
include/Quantum/Transforms/Passes.td:def RemoveChainedSelfInversePass : Pass<"remove-chained-self-inverse"> {
include/Quantum/Transforms/Passes.td:def AnnotateFunctionPass : Pass<"annotate-function"> {
include/Test/Transforms/Passes.td:def TestPass : Pass<"test"> {

I think in order to take advantage, we would need to change these to Pass<"foo", FuncOp> or similar. Let me give it a try.

May 08 '24 13:05 erick-xanadu

I am currently running into a weird error :sweat_smile: All test pass except one for printing the IR. When the IR is printed, it gets printed nine times. Even if it was not printed nine times, the test would fail.

Can you reproduce it locally? I've re-triggered a run of the code.

FAILED frontend/test/pytest/test_debug.py::TestPrintStage::test_hlo_lowering_stage - AssertionError: assert 'stablehlo.constant' not in 'module @fun...rn\n  }\n}\n'

Yes, also seen on CI

May 08 '24 19:05 erick-xanadu

I am currently running into a weird error 😅 All test pass except one for printing the IR. When the IR is printed, it gets printed nine times. Even if it was not printed nine times, the test would fail.

Can you reproduce it locally? I've re-triggered a run of the code.
FAILED frontend/test/pytest/test_debug.py::TestPrintStage::test_hlo_lowering_stage - AssertionError: assert 'stablehlo.constant' not in 'module @fun...rn\n  }\n}\n'          
Yes, also seen on CI

Oh, it turns out I broke the PR by my 'simplification' commit. I have reverted it and added comments. The problem happened because the dump handler was triggered by all passes instead of passes which end a pipeline.

May 21 '24 13:05 sergei-mironov

catalyst catalyst copied to clipboard

[MLIR] Enable multi threaded compilation

Codecov Report

catalyst
catalyst copied to clipboard