General overheads from MLIR are contributing significantly to compile time
Part of https://github.com/iree-org/iree/issues/11994
When considering running an entire compilation pipeline on a large program, some overhead costs from MLIR add enough time over enough instances to show up as significant in traces ("death by a thousand [paper]cuts").
Operator Verification
One example of this is operator verification, which we can optionally disable in iree-compile with the flag --verify=false. (See also: https://mlir.llvm.org/docs/DefiningDialects/Operations/#custom-verifier-code)
I measured one program taking 2m9s with verification and 2m3s without. At the start of compilation (Input and ABI phases), we haven't yet subdivided the program into individual dispatches/executables, so verification naturally runs on the entire input program with every pass. This causes even no-op passes to take on the order of 25ms. Some passes like Convert1X1FilterConv2DToMatmul even spent 80% of their runtime in verifiers (130ms/170ms):

Most of that file looks to be coming from linalg ops with affine maps:

General mlir:: functions
Here are some statistics breakdowns for mlir symbols when sampling:


(I'm not sure how actionable this issue on its own will be, like with https://github.com/iree-org/iree/issues/11994, but I'm at least hoping to start some conversations and generate interesting reproducers / traces / profiles for more specific tasks to reference)
Work upstream like https://discourse.llvm.org/t/rfc-introducing-mlir-operation-properties/67846 (from @joker-eph , discussed at the Open MLIR Meeting earlier today) sounds like it should help here.
Idea for a thousand-papercuts problem: use Pass::initialize more for frozen pattern lists and dynamic pipelines. Today most of our passes create the pattern lists and pipelines in runOnOperation() meaning that if they run for example on every function in every executable variant (thousands+) we're doing a lot of that work multiple times in the critical path. Instead we could change to providing any of the information required to construct those resources as options/args and do it once in initialize. If we do that throughout the stack for example in codegen we'd have one initializer per unique ExecutableTargetAttr instead of one per executable/function/etc being translated. To make iree-opt tests and such easier to work with we could make it optional by having a Pass shim that has a getCachedPatterns/getCachedPipeline that ran at initialization time if possible and otherwise on demand.