tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Unity][MSC][Tracking Issue] Introduction to Multi-System Compiler

Open Archermmt opened this issue 2 years ago • 7 comments

  • [x] [M0] Build MSCGraph core parts. Enable translation between Relay, Relax and MSCGraph without lossing information.
    • [x] [M0.1] Passes for set name and layout for expressions (src/contrib/msc/transform)
    • [x] [M0.2] MSCGraph core (src/contrib/msc/core/ir/graph && python/tvm/contrib/msc/core/ir/graph)
    • [x] [M0.3] MSCGraph Builder (src/contrib/msc/core/ir/graph_builder)
    • [x] [M0.4] Codegen (src/contrib/msc/core/codegen, src/contrib/msc/framework/tvm/codegen)
    • [x] [M0.5] Translation test (relax/relay test && related helper modules in python)
  • [x] [M1] Finish RuntimeManager for relax, and torch, so that a compiling process can be test based on MSCGraph.
    • [x] [M1.1] Add translate && codegen for torch
    • [x] [M1.2] Add translate && codegen for tensorflow
    • [x] [M1.3] Add codegen for tensorrt
    • [x] [M1.4] Add Runner and test with relax
    • [x] [M1.5] Add Runner and test with torch
    • [x] [M1.6] Add Runner and test with tensorflow
    • [x] [M1.7] Add Runner and test with tensorrt
  • [x] [M2] Use msc.runtime.Manager to manage the compiling pipeline && tools.
    • [x] [M2.1] Add Manager for compile pipeline
    • [x] [M2.2] Add pruner for model pruning
    • [x] [M2.3] Add tracker for track layer datas
    • [x] [M2.4] Add quantizer for quantize model
  • [x] [M3] Add MSCGym, enable auto compression. Add distiller, enable knowledge distilliation.
    • [x] [M3.1] Add distiller for distill model
    • [x] [M3.2] Add gym for pruning and quantization, enable auto prune/quantize
  • [ ] [M4] Add plugin builder, enable plugin wrap in different frameworks.
    • [x] [M4.1] Add plugin && plugin_builder, enable build and test in different frameworks.
    • [ ] [M4.2] Enable plugin with manager, test plugins in compile pipeline.
  • [ ] [M5] [Optional] Add MSCWrapper as compression toolchain.

cc @quic-sanirudh

Archermmt avatar Jul 05 '23 02:07 Archermmt

Intorduction @ https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251

Archermmt avatar Jul 05 '23 02:07 Archermmt

TODO: add tests for M0.2 after M0.3

Archermmt avatar Aug 18 '23 03:08 Archermmt

Discussion on translate relay to relax without loss info: https://discuss.tvm.apache.org/t/msc-translate-relay-to-relax-without-loss-info/15650

Archermmt avatar Sep 09 '23 12:09 Archermmt

I'm somewhat concerned about the relay -> python codegen -> relax code path used in tvm.contrib.msc.framework.torch.frontend.translate.from_torch when via_relax=False. This is a duplication of the serialization/parsing used in TVMScript (tvm.script), and can cause CI failures (e.g https://github.com/apache/tvm/pull/15783) due to this duplication.

While I agree with the need for a operator-level conversion from relay to relax, I think it should be done through extending the existing relax.testing.relay_translator.from_relay converter rather than having an additional python code-generator.

Lunderberg avatar Sep 26 '23 18:09 Lunderberg

@Lunderberg sorry for the late reply.... I've checked the failures, seems like tril/triu method have been changed, I'll fix them in latter PRs.

And the reasons why build a duplicate "relay -> relax" converter:

  1. A operator-level conversion is needed, as you said. This is essential when developers want to use relay based features (like me, testing tensorflow).
  2. Using relay also have some problems in optimizing the model, especially in quantization, pruning, parameter reusing and training. The real process in test_translate_torch.py from relay is : relay -> MSCGraph -> relax, MSCGraph is the basic DAG structure in model compression. This via_relax=False only shows an example of using MSC with relax and relay, not meaning to be a converter between relay and relax. When the final solution for the "operator-level conversion from relay to relax" is done, I will change the relay-relax method accordingly.

Thanks for watching !

Archermmt avatar Oct 13 '23 05:10 Archermmt

@Archermmt No worries, and I've been slow responding as well.

After thinking on it, I think my primary concern is in the method used for the MSCGraph -> relax conversion, which is done by first producing a python string, then calling exec on the generated string. This makes it very difficult to tell where an error has been introduced, as any errors in this process are thrown at runtime while executing the generated string.

Instead of generating a string to use the Python API, I think the MSC to Relax conversion should instead be done by directly calling the C++ APIs. This would expose any errors during the C++ compilation, rather than delaying them until runtime.

Lunderberg avatar Oct 18 '23 14:10 Lunderberg

@Lunderberg Emmm....I've also thought about this, which method is better: 1. Convert in C++ to enable eager errors detection; 2. Convert by string generation to enable independent loading. Both has advantage and disadvantage.

The first method (lets say converter, either C++ or python) like relax.builder can check and normalize the op while building graph, but that limit the deployment possibility. For example if I need compare the results between an old version tvm without relax and the new unity version(which maybe a real task for me....), I have to spend lot of time setting up environments and dumps testing datas with the converter solution. And MSC is designed not only for converting to relax, but also torch/torch2, tensorflow/tf2, tensorrt, and so on. Considering dispatch models in different framework and environment, the converter may not be a good solution.

The second method (lets say string generation) like cutlass codegen first generate strings and process them to kernel/model/engine. That means codegen process disable check and normalization, that may lead to lazy errors detection. However, strings can be change to script/C++ files and loaded in any environment, that method seperates codegen and loading, which is very essential in fast model release, especially on cloud(where different environment and framework are used).

And as mentioned in the RFC:https://discuss.tvm.apache.org/t/rfc-unity-msc-introduction-to-multi-system-compiler/15251 MSC is currently targeting at solving the model optimization problems base on relax. That means the codegen part should have the ability of using features in different framework, such as training, weights reusing/reloading, distribution system, and so on. Current I only have experience "describe" these features in python with string generation(not that good at C++ -_-).

To partially solve the error detection problem, the codegen in MSC not only generate the model, but also generate the unittest. Using the unittest developers can locate and solve the problems efficiently.

I think we can leave this part as a todo, thus enable C++ converter for MSC. After the main target is reached, I'll consider of building a converter, or may be directly use relax as the core IR.

Archermmt avatar Oct 18 '23 23:10 Archermmt