[RFC] Relax Upstreaming
This RFC proposes to upstream the core foundation of Relax including its IR, compilation flow, and runtime, to address the critical needs identified by the TVM community, and enable a cohesive (but optional) TVM Unity Connection milestone.
Hi @leandron, thanks for your feedback! :)
We share a common goal of minimizing disruption while incrementally improving TVM. One of the main questions is how to bring in the improvements. That’s indeed what we have carefully thought about.
One thing we found in building the unity connection is that we need to co-design things together. Take first-class symbolic shape support as an example:
Suppose we want to apply a flatten operation to flatten a tensor, here is the symbolic shape representation in Relax:
b: Tensor[(m, 224)]
a: Tensor[(m * 224, )] = flatten(b)
In Relay, we have ? to denote an unknown dimension. So the above program in Relay is:
b: Tensor[(?, 224)]
a: Tensor[(?, )] = flatten(b)
Without symbolic shape, we lose the shape relation between tensor a and tensor b, which prevents good optimization opportunities for example memory planning to reuse the memory between a and b since we know at compile time they occupy the same amount of memory.
Supporting this requires introducing native symbolic shape support as opposed to a separate mechanism. It's worth noting that from the example above, the shape mechanism in Relax is very different from what we currently have in Relay.
Additionally, the other improvements also benefit from first-class symbolic shape. For example, call_tir signature has output_shape which can be represented by symbolic shape (since TensorIR supports symbolic shape), and TE language is based on symbolic shape hence we can directly generate PrimFunc with emit_te (see Relax-TE integration section). Introducing each component separately will make the design less cohesive.
Directly introducing this symbolic shape support to existing IR would mean one-stop transition to the current Relax, which is not incremental improvements as we hope for.
Relax can be viewed as complementary to Relay. Relay focuses on high-level op transformations, while the current Relax passes focus on TIR-graph co-transformations that can enable flexible fusion and layout rewrite, which is hard to achieve in Relay.
As of this RFC, we do not seek to change the default build pipeline or replace Relay. In this RFC, we only introduce Relax as an optional component for those community members who need it. It is a common practice in other ML frameworks, for example, PyTorch brought in TorchFX as an optional (vertical) component to support graph exporting, while maintaining TorchScript. We totally agree that TVM default flow evolution is important, and we should carefully discuss that with the community in future RFCs. Evolving default build has been briefly discussed in Establish TVM Unity Connection — A Technical Strategy, and there will be an upcoming tvm.compile RFC to discuss the long-term strategy to consolidate default build flows in TVM.
Thanks @leandron and @ekalda for the comments. We all agree that we are trying to improve the graph-level IR of TVM while the controversial point is that if we can enhance relay to support features from relax. Let's discuss it directly and focus on the technical points themselves.
First of all, I'd like to list some most critical features that relax want to introduce:
- Dynamic shape support, to be specific symbolic shape representation;
- A representation for TVMUnity, i.e. a cross-layer abstraction for optimization;
- Customizable compilation flow and operator support.
In my opinion, it's hard to incrementally update relay to support them.
G1: Dynamic shape support
To be specific, relax can represent and deduce symbolic shape rather than use Any. However, if we introduce dynamic shapes to relay, there will be two competing repr for shapes (symbolic shape and Any), which makes it undesirable.
G2: A representation for TVMUnity
TVMUnity is an important feature for unified optimization for graph, tensor computation, and libraries. The build flow of relay is a one-way path: relay->tir/libraries->runtime module, while TVMUnity enables IRModule(graph+tir+libraries)->IRModule transformations, which gives users more flexibility to choose the backend (use codegen or call libraries) even after tuning. I'm not sure if it's possible for relay if we still keep the original workflow.
G3: Customizable compilation flow and operator support.
Customizing operators and backends are really common in production. There are 7 steps to add a new operator to relay. However, we only need 2 steps in relax:
- write how the op is computed (both tir or libraries are good),
- use
call_tirto represent it in IRModule
Additionally, other compilation customization skills (e.g. BYOC, AOT, customized fusion) are also more straightforward with relax. Please see the TVM Unity Connection
In short, a new IR is a reasonable way to support the above features IMO. And I'm open to hearing more ideas from the community.
Thanks fo the RFC. Although I didn't involve the actual Relax development, I've been attended the weekly open design review meeting for a while and I'm glad that I could share our experience to help improve the Relax design. Thus, I don't have specific questions to the design.
Regarding to the point that mentioned above about whether we really need a brand new IR to replace Relay, in fact, we at AWS already attempted to build a compiler-based training framework, RAF, by extending Relay. Based on our experience working on RAF for the past 1.5-2 years, we agreed with the Relax group that Relay does have some limitations in its design, and these limitations prevent us from easily adding some new features such as dynamic shape, flexible fusion, etc. Note that I'm not saying it's impossible to implement these feature by extending Relay. My point is even it's possible, it is hard to keep a clean system/infra without a large scale refactoring. To me, it is even safer to build a new infra from scratch, so that existing workloads and use cases won't be affected at all. This is also the reason why we noticed Relax and tried our best to involve the early stage design in the first place.
Meanwhile, in terms of maintaining two IRs, I don't really think this would add many overheads to the community, because these two IRs are basically independent and can be developed separately. In other words, it's up to the developers to work on Relay or Relax. As long as there's still a number of Relay developers in the community, we shouldn't deprecate it. On the other hand, if people found that Relax is better over time, then developers will gradually move to Relax. Until then, we will bring the Relay deprecation on the table, just like nnvm in the past.
Thank you @leandron @ekalda for the questions, and @zhiics, @slyubomirsky, @Hzfengsy, @sunggg for the discussion!
As a long-term contributor since 2018, the pre-Relay era, and the initiator and top 2 contributors of RAF (https://github.com/awslabs/raf/), the TVM-based training framework, I would love to share my perspective and slight concern about the TVM development at this moment, 2022.
While being a decent auto-tuner for static shape workloads, and the latest work with auto tensorization further boosted its performance with microkernel tuning, there has been strong demand from the community to allow TVM to do more, which as @YuchenJin listed, includes:
- Unified abstraction
- Dynamic shape support
- Dataflow block and first-class side effect handling
- Non-inference workloads
As a community, we do encourage everyone to understand different perspectives and empower each other, and I believe this is the way for us to grow.
Technically, just wanted to address a meta question here: why is it less feasible to gradually upgrade Relay?
- Conflicted design philosophy: Relax follows a completely different design than Relay with mutually conflicting assumptions and ideas. For example, having two conflicting shape mechanisms in the system would effectively mean passes have to handle both of them.
- Engineering challenge: design difference leads to hurdles for incremental updates. For example, if we want to move away from the assumption that the IR is side effect-free, all the passes with the old assumption become automatically invalid or wrong because the assumption is not respected.
- Stability concern: Even if we do surgical incremental enhancement to Relay by introducing breaking changes piece by piece, there is still stability concern. Consider a case where there are downstream vendors whose forks depend on upstream Relay, and Relay’s assumptions break over time, it would be less stable for them to maintain Relay.
Alternatively, we believe having Relax as a separate pass is a cleaner and more maintainable approach - gradually bringing some of the passes from the bottom is engineeringly incremental and guarantees that the Relay code path is always stable.
@YuchenJin
Relax can be viewed as complementary to Relay. Relay focuses on high-level op transformations, while the current Relax passes focus on TIR-graph co-transformations that can enable flexible fusion and layout rewrite, which is hard to achieve in Relay.
I like this separation of work between Relay / Relax. We have many Relay passes that work all right and for which it doesn't make a lot of sense to reimplement in Relax.
But if Relax is supposed to be complementary to Relay, is the name "Relax", as "Relay Next", still a good choice? "Relay Next" strongly suggests that Relax is something that is going to replace Relay, like we did for nnvm. I'm still not entirely clear if the plan is to eventually deprecate Relay, or Relay and Relax are going to coexist for foreseeable future.
Thank you, everyone, for the discussions here. Let us take a step back and look at the non-technical parts of the conversation. A lot of our discussions come from two goals:
- G0: Maintaining a stable evolution solution for some of our common use-cases
- G1: Welcome new improvements, land our technical commitment timely, continue to reinvent ourselves, and welcome new community members who have new use cases.
Both goals are very important. G0 ties to our ability to continuously support our current use cases. G1 is also essential to our viability as a solution, so we can grow as a community and stay competitive in a fast-evolving machine learning compilation landscape.
Enabling both has always been an important theme of long-living projects. Deep learning frameworks are a common reference to refer back to. Usually, they are done in roughly three phases:
- S0: Introduction of a new feature/component as an optional module.
- S1: Evolving the overall solutions to make use of the new component.
- S2: Consider deprecation of some of the existing solutions, or evolve the solutions for a consolidation point.
Each stage contains a different level of commitment and would normally entail different levels of gating criteria as we look at them.
For example, PyTorch introduced TorchFX as an optional module that supports graph tracing and export. It had some overlapping capabilities with TorchScript. The PyTorch community is collectively evolving some of the compilations (TorchDynamo) to make use of FX. As of now, there is not yet an announcement of S2 from the community.
Encouragement of S0 and making it easy to do helps us to enable G1. A too high barrier here can discourage community contributions and result in mainline lacking the latest features and short-living our competition. This is especially important given that the land of machine learning compilation still remains open, and the ability to timely support symbolic shape and training helps bring in users and contributions who would otherwise turn to alternatives.
G0 is equally important here. In many cases, they boil down to making careful and informed decisions regarding evolution (S1 and S2). Additionally, making sure that at S0 stage, there is a limited disruptive change to the existing infrastructure. Importantly, not every module/feature has to go through all stages. And in common practices, the decisions in each stage are usually not made at the same time.
We can find examples of S0 cases in TVM as well. For example, USMP was currently designed for specific cases like AOT. We welcomed these improvements to unblock needs in embedded settings early. Through USMP we found the need of tir.alloc_const, which related to evolving on existing infra(S1). As a result, we had a more in-depth discussion. Additionally, we are bringing the effort to further enable USMP in a broader setting as part of S1. At some point, we might consider consolidating all memory allocations as S2 – note that many community members are collectively working toward that goal, but we are not yet at a point to make such a decision. As another example, we enabled cascaders that are specifically designed for micro-NPU, which had some domain overlapping with the arithmetic affine module, but nevertheless bought in without consolidation because we believed that there is enough interest and maintenance support for the module. Finally, the unpacked_api was specifically enabled for extremely low-resource settings, and we enabled S0 level inclusion despite some inconsistency with the packed func API.
Of course, we do not want to enable random things in the codebase, which ties back to the maintenance overhead concern. One of the questions we want to ask here is whether the module contains enough support from the community that allows continued maintenance. Additionally, we should consider the fact of added engineering support by welcoming additional community members who are interested in the needs and would otherwise look elsewhere.
Our overall thought process and decision time point for each stage can be different – they should be so we can enable both G0 and G1. Nor do all modules have to go through all the stages.
For S0, we would expect if there are enough champions in the community with a self-contained plan. For important features, we would expect, say, more than three committers who can champion the module and significant community support to maintain them. Additionally, S0 should be made as minimally disruptive (wrt to the current infrastructure) as possible. To encourage G1, we can overlook some levels of duplications (just like the TorchFX and TorchScript case, USMP, and other allocators when they land as S0), considering the additional community support we get to maintain them.
S1 and S2 would involve more careful discussions and coordination with greater amounts of details on some of the key points. Likely, they will also happen at a different time point so we can make informed decisions.
This particular RFC is at the S0 stage and intentionally made to be so. As the RFC stated, there is no proposal to make S1/S2 decisions at this RFC. Many of our current discussions are around S1/S2 – the future evolution of the system. They are extremely helpful discussions to have to set up the context and help us improve the design, but not necessarily decisions we have to make immediately. Let us think about the broader community members we can empower and bring in through enabling the S0 improvement.
Thank you, everyone, for the discussions so far, and let us work together to enable our community.
On the point about potentially incorporating symbolic shapes into Relay, I would like to hear more detail about how it can be done with Relay's system of accumulating type constraints and solving them simultaneously. If we were to handle dynamic shapes in Relay, we would need to define semantics for how shape variables are scoped and how assignments are handled, how they can be processed during the solving of type constraints, and what happens if symbolic shape expressions cannot be concluded to be the same at compile time. If this can be neatly incorporated into Relay, then it might make sense to pursue. I would be happy to brainstorm on that issue.
Having taken onboard the feedback from community members (@leandron, @ekalda, @Mousius, @masahi), a number of us involved in this RFC (@YuchenJin, @jwfromm, @tqchen, @areusch, @mbaret, @jroesch, @tmoreau89) feel it’s necessary to be explicit about the scope of this proposal, and we apologize to those reviewing that this was not present in the original text.
-
Acceptance of this RFC doesn't mean there is an agreement to eventually deprecate Relay and replace it with Relax. It only permits bringing the development that's currently occurring on the Relax fork into the TVM repo. This will improve the accessibility of that important work for community stakeholders who rely on it, as well as bring Relax under TVM project governance.
-
If at a later stage it's found that individual features from Relax are desired in the Relay compiler (e.g. dynamic shapes, TVMScript support), design discussions and RFCs must take place to determine the best way to implement those features. Acceptance of this RFC gives no preference to Relax as the solution, and so evolving Relay would remain firmly on the table in those discussions.
The RFC has been accordingly amended to include the above commitments in the "Motivation and goals" section, which we hope addresses some of the valid concerns expressed so far.
Thanks everyone for discussions so far. There are a lot of conversations around the clarifications and issues being answered. Two of the main concern being raised so far seems to be:
- rationales and alternatives.
- overall scope and execution (relation to relay deprecation)
In this case, @YuchenJin have:
- Updated the RFC with the rationales and alternatives
- Clarified that the scope of the proposal being in scoped to checking in the module as optional and non-disruptive or changing existing modules (nor do they imply deprecation of relay).
I think the above two actions have addressed the concerns being raised wrt to the particular scope of this RFC. @leandron @ekalda @Mousius please followup and see if you have more questions
Let us continue follow the overall guideline of providing grounded technical justifications and bring constructive discussions, knowing that some of the high-level tastes argument can be subjective.
In those cases, thinking about the community over code principle can go a long way. The particular Unity connection strategy already got majority support from the community(from more than 8 organizations), let us think about the following question: How can we collectively empower those members together?
Thanks @YuchenJin for updating this RFC!
From the use-cases that I observed from ByteDance, the symbolic shape capabilities allows TVM to handle dynamic workloads that cannot be handled in other frameworks and be more widely adopted. And we can quickly enable iterations of unity connection as stated in RFC https://github.com/apache/tvm-rfcs/pull/91 , by bridging the high level representation with the low level TensorIR.
Based on our experience at NIO, dynamic shape support in Relax is extremely important for us. In fact, we have done many things on Relay trying to cover dynamic shape support on our user cases, however lack of first class support for symbolic dynamic shape still constraint us some ops / patterns can not exist on models. First class support for symbolic dynamic shape is extremely extremely important for us, especially some model is essentially dynamic input / output, for example Point Cloud. Relax, this is what I've been expecting for so long. If we have Relax, for Point Cloud or Object Detection model's dynamic output / Dynamic Batch model, Relax could solve it perfectly (whether from the view of functionality or performance).
Anyone has doubt I recommend to read this: https://github.com/tlc-pack/relax/wiki/Relax-Shape-Computation-Design.
Thanks @YuchenJin @tqchen and many relax authors to bring it, I very appreciate this work 👍 and in fact, I am evaluating relax internally and want to let relax to solve our problem ASAP.
In addition to the use cases and experience I've mentioned previously, I want to further highlight that symbolic shape support becomes even more important in these months for us at AWS, mainly due to the requirements of deploying decoder models (e.g., GPT). Since the text generation process is a natural dynamic shape workload in terms of sequence length and batch size, padding everything is not practical due to its inefficiency, which is already shown in latest paper publications. It is extremely important for TVM to support this case if we attempt to keep being the SOTA deep learning compiler.
@Mousius In this case, @YuchenJin 's reply clearly articulated that there is a close co-design of these factors, and changing to adopt dynamic alone would imply a one-step jump to relax -- which is not incremental. The data structure change would come with a set of supporting infra and co-design of things including shape inference, and other things.
Of course, if we do not maintain both behavior and allow an incremental transition. Then it is equivalent to a relay=>relax bridge that would allow part of the pipeline to go through in a more incremental fashion. This approach is consistent with what is being proposed in the RFC (with clear data structure boundaries).
I would suggest us not to trivialize the argument here, as it is easier to say why not do X than really go and do things. The learnings from @YuchenJin by concrete code certainly is very convincing, as well as their learnings of why not do things otherwise.
This is a case where there are subjective disagreements happens. In such cases, I would encourage again for us to put community into consideration, and the fact that things are not disrupting other existing flows
@Mousius In this case, @YuchenJin 's reply clearly articulated that there is a close co-design of these factors, and changing to adopt dynamic alone would imply a one-step jump to relax -- which is not incremental. The data structure change would come with a set of supporting infra and co-design of things including shape inference, and other things.
Co-design doesn't mean we have to land everything at once, it's often easier to prototype several features together and then break them down when we get to delivery - I believe that is most incremental and understandable rather than landing large changes with several dimensions at once.
I'm also unsure how it's a one-step jump to Relax as @YuchenJin has demonstrated that the functionality of Relax is split into several different pieces. I understand that in order to land the entire proposition of Relax it may take several pieces landing, but that's natural for a large change such as this?
Of course, if we do not maintain both behavior and allow an incremental transition. Then it is equivalent to a relay=>relax bridge that would allow part of the pipeline to go through in a more incremental fashion. This approach is consistent with what is being proposed in the RFC (with clear data structure boundaries).
They're not equivalent as we can avoid adding additional variations in our code by making a breaking change early and iteratively integrating the other various pieces of Relax into existing code. @slyubomirsky also expressed an interest in investigating the dynamic shapes in isolation so I believe this could have some momentum?
As an individual Relax contributor from UCSD, I don’t feel our community has created an environment that welcomes new contributors and new contributions.
I am happy to see different opinions in the discussion, but it's really shocking and disappointing that we are afraid of moving forward even with large amount of community support, and for such a technically strong optional module.😞
Based on several community members, Relax can solve the needs that they care about. It’s also made clear by @YuchenJin in the RFC that Relax is an optional module which does not disrupt the current TVM flow, and it does not have the intention to replace Relay. In most open source projects, the evolution is done in phased manner. I am very surprised that such a module that can solve the current pain points of tvm has been blocked by subjective opinions.
It's hard to imagine that everyone who wants to contribute to tvm, especially those who do foundational work like relax, needs to do future predictions about existing flow deprecation and timelines, which in my opinion discourages people from contributing.
Thanks everyone for the feedback. One thing that we seem to agree on is that there is a strong need to support symbolic shape use cases for TVM, as represented by many of the folks who chimed into this thread.
Hopefully, we all agree that there is a strong need to support robust and high-quality dynamic shape compilations, and that is not being supported in the current TVM.
One of the questions is whether or not we can do that by separating the changes of C0, C1, C2 in Rationale and Alternatives.
In order to provide values to the user, we need end-to-end dynamic shape compilation for some specific environment of interest (for example it is less relevant at the moment for micro settings). C0, C1, and C2 are necessary to support a minimum but robust dynamic compilation:
- We need to effectively lower the symbolic shape code to a sequence of calls into symbolic TensorIR functions and libraries (when we cannot generate TensorIR functions). This means in order to do so we need C1: Add first-class support for interactions with TensorIR and TVM FFI.
- In order to further lower some of the shape handling (match_shape), we need to generate code that contains side effect (such that it allocates the buffer and writes into it). This naturally means we need C2: Add first-class dataflow and side effect handling.
In summary, in order to build a minimum viable dynamic shape compilation in the most high quality, robust, and general fashion, the lowering flow would require stages that contain C1 and C2.
We would also like to highlight that having C0, C1, and C2 helps us to achieve the goals faster. For example, running fusion on calls into TensorIR means we no longer need developers to annotate the fusion properties of each op, and we no longer need to handle weight layout rewrites in an ad-hoc way. This reduced engineering complexity thanks to better architecture helps us to bring robust high-quality support for dynamic shape faster to the community.
The dynamic shape compilation, along with first-class C1 and C2 interaction are needs that we do not support well today in TVM. They offer clear values to our community, and differentiate from the cases that Relay already supports.
One of the main topics raised was how Relax fits into TVM in general. We acknowledge that it is impossible to provide the full picture immediately. It is also not necessary to make all decisions in the beginning, as making uninformed decisions is worse. It is also common practices for OSS project to evolve and bringing in S1-conversations along the way. For example, when TorchFX got introduced, there was no prediction of TorchDynamo, which got introduced in the later part as things iterated through the codebase and became clearer.
Nevertheless, we would try our best to address how Relax fits into TVM:
C0, C1, C2 also highlight how cohesive relax fits into TVM’s current codebase more than many other modules in the codebase. In particular, the design deeply aligns with TensorIR, topi, symbolic shape and PackedFunc and many other modules.
We also view having Relax and Relay co-exist in the codebase as a positive thing and better than the current state:
- We get dynamic shape support for users who needs them, empowering the community members.
- We get more developers thanks to the empowerment, offsetting the burden brought by the module.
- The burden of establishing the module is minimized as it will cause no disruption to the existing modules/flows.
- In many cases, having Relay and Relax work together is a positive thing:
- We use Relay for importing and graph optimization and use relax for TIR-related co-optimization. It is a great combination and a better pipeline for some of the use cases.
These positive parts already would bring the community a long way. We also acknowledge that we cannot pin down all the technical decisions around S1/S2 level in the very beginning. Please see the Future Opportunities section in the TVM Unity connection RFC, which I encourage everyone to check out.
The author also wants to come back to community empowerment. In this case, many in the community have voiced their need for empowerment with grounded technical reasonings. We would like to encourage us to take a step back, as most of the technical reasons are less grounded and we do not need to make all decisions in a single shot. We are looking at a module that is clearly isolated, makes no disruptive change to the code base, and is needed by the majority of community members. Having open-mindedness here would go a long way toward building a welcoming and inclusive community.
In Intellif, people build, maintain and extend the DL compilation stack with Relay in past years. However, we never think the upstreaming of a new module would break existing functionalities or cause confusions, but huge opportunities to solve many technical issues which prove to be not so easy to handle in Relay, which are already emphasized in the discussion thread.
From my perspective the TVM community is a very inclusive community. We do have modules of certain overlapped functionality co-exist without so much debates. As examples we could see different runtime implementation for Relay, TE-schedule and TensorIR-schedule, Ansor and meta-schedule, etc. Wish it is also not a problem on graph ast infra.
Thanks for this great job ! Based on our experience at Meituan, dynamic shape is important for our use cases, e.g. OCR and ASR models with dynamic seq_len. Now we could solve these with relax and vm runtime :)
For those interested, I think this recent paper shows one way how symbolic shapes could be made to work with Relay's type checking approach (Axon is clearly structured very similarly to Relay), though it would require substantially reworking the existing type relations in Relay. It's rather different from Relax's approach, so it's a possible point of comparison.
I'm a graduate researcher at UW and had been a full-time SDE at AWS AI for years, my work is around Deep Learning Frameworks/Compilers. I feel like all of us agree dynamic shapes are essential so I don't want to spend more time emphasizing how important it is. I'm not a contributor to Relax, but I have been following it for a long time. I don't want to pretend to be neutral, I do think it is quite necessary to welcome Relax, rather than just adding dynamic shape support in Relay.
The main controversy in this thread is about whether to upgrade Relay incrementally or develop a new IR called Relax. I understand hardware companies appreciate stability, and we can see CUDA didn't change its interface drastically over the years, what a miracle! There must be several times people wanted to develop new languages/compilers for NVIDIA GPUs but CUDA survived, this is a lesson we should learn: in the beginning, we design things with a vision of the future in mind, then we maintain them with high standard, improve it incrementally and be customer-obsessed.
This is the ideal story, but we should not ignore that though CUDA was invented before the DL era, there are already many high-performance computing workloads the designer can refer to. Fortunately, even in 2022, the operators used in DL still highly align with HPC ones and are actually simpler (it's a world of GEMM). What about the story of (computational) graph-level IRs? The dominant workload in DL changes over time and I would say they cause a lot of headaches for framework and compiler designers: first CNNs/RNNs/LSTMs/Tree-LSTMs(the structure dynamism is one of the challenges Relay would like to tackle, but unfortunately they are used nowhere), then we have Transformers/GNNs(not as hot as Transformers because of hardware lottery, but who knows the future). Now we have entered a time where models converge, but scalability grows significantly: models become larger and larger, and a lot of engineers and researchers propose (checkpointing and rematerialization, quantization, graph substitution, fusion and stitching, sparsification and mixture-of-experts, hybrid parallelism) to optimize DL workloads at compile-time, and I'm glad to see many of them are developed upon TVM because TVM's design is always up-to-date and support new workloads quickly, however, Relay's current design cannot take full advantage of these new techniques, and the system has the trend of becoming fragmented. Relax is a great opportunity for us to reconsider the graph-level IR design: prune the redundancies and add new functionalities, it's exciting to see we can unify different levels of optimizations together in TVM Unity, once Relax is accepted by the community. Refactor makes things simpler, rather than complex.
Whenever we found it's time to make some changes, TVM always embraces new designs. This happens several times in TVM history: Prior to Relay, there is NNVM, which is deprecated and completely replaced with Relay. The previous Tensor-Expression has limited expressiveness, and the schedule tree data structure cannot support tensorization elegantly, then we have TensorIR, which is not only backward compatible, but also brings opportunities for developing new dialects (Ruihang and I designed SparseTIR upon it, works pretty good). The AutoTVM cannot generate scheduling templates automatically, then we have Ansor and Meta-Scheduler. I would emphasize that the core components of all these updates are upstreamed within several months, and do not break any backward compatibility, it credits to our hard-working and open-minded contributors and reviewers. Committing to TVM helps these contributors become MLC experts, some of them are PMC members now. I would say non of these refactoring influences TVM's reputation, on the contrary, it makes people impressed by TVM's speed in adapting to the future, and they are more willing to try TVM because it's open, it's driven by innovation.
I really don't understand what's the difference this time, when it comes to Relax? We have a bigger community, this is awesome and I definitely welcome your input and constructive suggestions on the future of this project. I view the New Scoped Module RFC as a contract between industrial developers and researchers/engineers like me that works on "toy prototypes", we promise not to touch anything that might influence user experience, we also don't want to be discouraged because my prototypes cannot be upstreamed and only stay in some random GitHub repo as a toy. I also think the new S0-S1-S2 progress is already the most painless approach to delivering new designs, and the effect is equivalent to incremental change. If people take a look at the Relax repo, it already has a huge amount of code there and well-written documentation (you can compare it with the official relay documentation), I think it's super inappropriate to ignore these contributors' devotion, especially individual contributors such as @LeshengJin . TVM has a huge user base of researchers, they are an important part of the community, and they also contribute high-quality code instead of just hacking.
Regarding the "lower standard than other communities" issue, TVM has high standards and we are not talking about standards. If no fundamental changes are allowed in DL infrastructures, google should stay at TF 1.0 and never develop JAX, and PyTorch should not create so many different compiler infrastructures (I want to share this slide again.
I'm willing to provide more details on every argument if requested.
Best, Zihao
Based on my experience at several organizations, dynamic shape support is obviously very important, particularly along with the popularity of large language models. Also, efficiently supporting dynamic shape would be one of the major appealing features of a "modern" DLC. I think the above comments have also reached the agreement of importance of dynamic shape. The major argument is whether we need to have separate PRs to land this feature.
IMHO, Relax is already one of the components of Unity, and the current proposal again only contains the most value part of Relax which provides a minimum E2E compilation flow to enable the support of a dynamic model. This somehow has been working well before in both TVM and other open source project since the component doesn't blocking/breaking the current uses/deployment. For example, the first version of Relay also had IR, simple lowering, and necessary passes to quickly unblock the users/developers (e.g. AWS) who want to give it a try. Afterwards, we iterated on it many times to improve both the design and implementation.
As a contributor of TVM, I would encourage we focus more on the design itself and spot the design flaws and the missing key features that we should address so that users (some of them are already waiting for Relax as mentioned here) can quickly check it out and bring us back with more insightful feedback or directly contribute to the project.
There were concerns that bought up in RFC #95 that this RFC conversation did not cover "how proposal fit into TVM". We agree that discussing the fit is important and would like to refer to related conversations and sections:
-
https://github.com/YuchenJin/tvm-rfcs/blob/relax-upstream-rfc/rfcs/0089-relax-upstreaming.md#6-rationale-and-alternatives demonstrates the design deeply aligns with TensorIR, topi, symbolic shape and PackedFunc and many other modules.
-
https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1267729342 discusses the fit and how relax can address e2e dynamic shape compilation problem.
-
https://github.com/tqchen/tvm-rfcs/blob/main/rfcs/0091-establish-tvm-unity-connection.md outlines the fit of the unified composable flow.
I learn a lot from reading through the thread, and find most people here are from a system background: either doing related research in schools or heading an engineering team in companies. I would like to share some of my thoughts from a different perspective, as a TVM user and ML algorithm developer.
I am a graduate student at MIT and studying efficient deep learning algorithms and co-designs (details in my page, lab site and our recent project that trains NN on a 256kB MCU). We have been honest TVM users because of its flexibility, high performance and open-source. But, when we want to dive deeper and make some customizations, things are becoming complex and relay is no longer friendly
- Unnecessary long call stack between python and cpp: Take
relay.buildas an example, a relay graph (in python) first does shape check (in cpp), then calls to wrapper (python), later feeds into TensorExpression (either in python or cpp), and then feed into VM for compilation (packed functions). ANY step in the middle can raise errors and developers can easily get lost in the pipeline. Actually you can find a lot of users reporting similar issues on the forum and only very few of them can fortunately get an answer from experienced developers. - Difficult to add a new operator because of complex pipeline: In our research, and also many other users development, adding new operators is a common request. But in current relay, even if we just want to add a simple Identity operator (y = x), we need to
- declare an attribute node.
- write type relation check in CPP.
- register OP in CPP.
- describe the compute.
- describe the schedule.
- wrap up with CPP.
- wrap up with python. Seven steps just to define an identity function? Seriously? In PyTorch it won't cost more than 20 lines. This significantly slows the growth of TVM community and if you check the PR history, the numbers of new operators and new contributors are quite limited this year, while PyTorch receives new operator implementations from the community every day.
- Missing capability to call third-party implementations: Relay syntax does not, at least not easily, support users from call 3rd party backend like CuDNN, OpenVino, TensorRT. For the cloud, CuDNN and TensorRT are still SoTA for most benchmarks and without simple integration means inferior performance, which will make fewer people choose TVM. For the edge, the situation is even more serious because of hardware diversity. Take Qualcomm DSP as an example: even though the TVM hexagon support is in progress, but the best solution is still those manually written kernels in SNPE. It is not trivial to call other backends in current relay: BYOC is difficult to use and register custom operators can be quite complex as discussed in last point.
I understand those who want the backward compatibility so existing projects are not broken. But we cannot build a ship of Theseus in the real world and the above issues cannot be easily "improved" with current relay. If TVM do not embrace new designs and improve its user-friendliness, then, eventually developers will switch to other tools and this is indeed happening:
- Oneflow uses MLIR to rewrite their compiler pass to accelerate diffusion models by 4x compared with pytorch and 1.6x compared with TensorRT.
- Megvii adapts MLIR to minimize runtime build to generate YoloX binary with just 95kB.
- PyTorch proposes TorchDynamo to speedup training and achieves average 1.34x speedup over previous NVFuser.
- ...
I like the TVM project and hope the community can be always active. TVM has a huge user base of researchers and Relax can allow them to easily contribute their code and idea to the repo, instead of tricky hacking and creating separate repos for each project. This is important for an open-source community -- just recall how mxnet loses its market and why PyTorch can beat TensorFlow even released one year later. TVM should consider Relax's upstreaming given its more thoughtful and user-friendly design, well-written documentation/tutorials, and S0,1,2 painless upgrading.
I would like to discuss more if there is any comments and questions.
Thanks everyone for the discussions! A brief recap of our discussions so far:
-
We are certain that Relax supports dynamic-shape workloads that are not supported by the current TVM, which can immediately benefit many community members and users.
-
For why Relax should be brought into the project today, we showed that having Relax and Relay co-exist in the codebase is a positive thing in several aspects (https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1267729342). And the path to moving TVM to a Relax-only project will be long, so Relax and Relay co-existing is necessary for the foreseeable future, just like how TorchFX co-exists with TorchScript in the Pytorch project. We acknowledge the concern that Relax can bring confusion to some of the members in terms of which IR to contribute to, but we also encourage the community to consider the fact that Relax can directly bring dynamic-shape compilation to TVM while the original workloads can still be compiled by Relay compilation, and other factors including community empowerment and the scope of this proposed module.
-
It’s been pointed out that it would be helpful if we lay out the ideal scenario for how we see Relax and TVM Unity evolving over time in the TVM project. The reason we have built Relax is that we are confident that Relax both in current and future forms will significantly improve TVM, and we have outlined the future opportunities in https://github.com/tqchen/tvm-rfcs/blob/main/rfcs/0091-establish-tvm-unity-connection.md#4-future-opportunities-after-unity-connection. Nevertheless, it is helpful to explain in more detail given our current state of knowledge, so we will bring in the discussions of integration of Relax in TVM default flows and consolidation/deprecation of Relay and Relax as add-ons to the roadmap.
After seeing so many voices in this thread. I think it is important to provide a reply here.
I am wearing the Apache TVM hat as a ASF member and Apache TVM PMC member.
First of all, I would like to say thank you, everyone, for sharing your voices here. This post has received support from more than eight organizations from both industry and academic backgrounds. Your voices are very important to the community and will not be ignored. As many said, we would love the TVM community to continue being inclusive and innovative while maintaining the stability of existing developed components.
I also would like to come out and acknowledge the positions so far:
The position that @leandron made so far was:
- We do not like to be in a state where relax and relay coexist without deciding the commitment of one replacing another.
- As a result, due diligence of such a replacement is mandatory before merging the proposal.
I would like explicitly to acknowledge that the above positions have valid rationales, are completely valid, and can be a possible way of software development.
I think the position raised by @YuchenJin and others were:
- Relax could have the potential to replace relay, but the proposal as it only proposes to have the two modules coexist.
- Just like how most OSS projects bring in modules and evolve things (e.g. TorchFX being brought in overlaps with TorchScript, nor plans to immediately phase out TorchScript). The modules can coexist, evolve, and we continue conversations about future co-evolution.
- Relax and Relay coexist in the codebase is already a positive step that we shall take, especially considering community empowerment.
These are also valid rationales and can be possible ways of developing things.
As a first step, I would like to acknowledge each others’ positions as they are valid rationales. The main difference is that there is a disagreement on how we should do things as a community.
Such a decision should be made collectively as a community, considering all the factors involved: including code and community factors. We all make our suggestions holding innovation, stability, and community into account.
When evaluating a proposal and empowering our community members, we expect every one of us to continue having a constructive conversation, considering the latest context.
While the initial comment made by @leandron is valid on its own, I would love to see we re-evaluate our positions message considering all the factors in the latest context, including community empowerment and the collective views of other members. I want to say that by no means do we simply seek to dismiss the original position -- i would apologize if I it makes it feel that way. Instead, we want to acknowledging each view, and we have disagreements on hows, and taking community's view into consideration. Have constructive public conversations in services of many who have voiced their support here -- as we continue to empower each other to bring an inclusive, welcoming and supportive community.
Thank you!
My position:
- Relay and Relax is going to co-exist as parallel submodules in TVM, and one should not affect the other at all;
- Committed to keeping Relay source code in "main" in the foreseeable future without hinting about potential deprecation;
- Having Relax in "main" >>> having Relax in a separate branch > not having Relax at all.
need to update it quickly
Five years ago, we started with a community that comes with a common vision in mind – enabling machine learning engineers to optimize and run computations efficiently on any hardware backend.
Five years later, the fields of machine learning and MLC(ML compilation) have gone under rapid changes. That same vision is still shared among this community. This is why many of us still feel so fresh when writing code patches, bug fixes, architectural refactors, and new features. We are here today thanks to a diverse community that comes with different perspectives and areas of focus but still aligns around that common vision.
As a project, we benefit from different perspectives to survive in the ever-changing and competitive area of ML compilation. Hardly one can predict every detail of the future (just observing the set of recent changes such as chatGPT and stable diffusion). Hardly can one definitively assert that one approach would be better than another one from the very beginning. Enabling diverse possibilities helps to move the project forward while enabling different needs.
As a community, while we care about different subsets of modules and do not always need to work on the same thing, there is always an overlap of interests, regardless of whether it is the graph, FFI, TensorIR, or backend that sustains collaborations among different people. Most importantly, we come with a mindset of empowering each other under the same vision.
Thank you, everyone, for participating in this thread.
This thread arrives at its current state due to different perspectives on possible thoughts of project procedural operations (whether a detailed migration plan and commitment to migration are necessary for the new module proposal). There is a common agreement that migration(if it happens and is being proposed) would require a lot of details and community buy-in, but different opinions about when that can and how that should happen.
On behalf of the TVM PMC, I would like to recommend an initial step to help us to recognize achieve the following goals from different members of the community:
- G0: Get us out of stagnation and empower the community, including many who shared their support in this thread, to participate in unity development in the TVM community.
- G1: Give some time to answer questions, and provide examples to those who have shared needs to have more detailed evidence and possible feasibility analysis of migrating some modules.
Specifically, we would recommend us to follow an existing practice in projects like Hadoop, to empower related development in a branch. ASF mechanism allows any committer to create a branch in the apache repo and do collaborative development there at their own pace. Per our existing process, merging a branch into main still requires lazy consensus. Branch development offers flexibility while accepting the risk of blocking when merging to the main. As a result, there are general incentives to keep alignment with the majority of the community and continued engagement to get buy-in. Branch development offers a way to collaborate on a possible but not definitive future of the project, as a branch can come with different outcomes such as being partially merged, continued development, or abandoned. Enabling different perspectives is important for us both as a project and community.
TVM PMC re-affirmed that branch development can be used as an option for the project and specific development around tvm unity. We would like to offer it as a possible option for the community and the first step of execution, with the goal of getting related pieces into main. I wrote down a more detailed post, which we would love to get everyone’s feedback. Of course, this is only one possible option, and community members can freely choose their ways of participation.
Developing in a branch will also give some time buffer to answer G1. It is valuable to answer questions and have grounded conversations to give more information to the members who are not yet on board with the new module. Noticeably, to many community members, detailed code examples, benchmarks, and continued engagement are necessary to get broader community buy-in. We would recommend having focused discussions on the questions of interest (e.g. give concrete code tutorials for BYOC) to help the community members who have related questions. We encourage such continued conversations in forum threads, meetups, and development interactions with the goal of getting as much information as possible. Again such interactions aim at demonstrating possibilities, but not warrant deprecation or migration since that choice should still lie in the hands of the community. Hopefully, they give more comprehensive pictures for us to make follow-up decisions collectively.
As part of winter break, I started to do more coding, and I was really fascinated to see that passion still is deep in my heart(and I believe in many of us) after so many years, thanks to this community and our common vision. As a community member, I am really motivated to spend focused energy helping to build concrete code examples and tutorial materials for G1.
Please also checkout this post for more details
An update, thanks to effort of many community members we are now at a period where the initial foundational items in unity branch are now established.
One goal of the G1 is to give some time answer questions, and provide examples to those who have shared needs to have more detailed evidence and possible feasibility analysis of migrating some modules.
As part of the effort, the community members have been posting tutorials on related topics of interest in the forum. I would encourage folks in this thread to feel free to ask for what additional tutorials you want to see and/or specific technical questions.
Just another update, it is great to see unity being developed and used for dynamic shape and emerging use-cases
One goal of the G1 is to give some time answer question. There are more topics of related interest(some might related to the questions in this thread https://discuss.tvm.apache.org/c/development/unity/14) Please check it out and feel free to participate the discussions and technical questions