moose
moose copied to clipboard
Making (& keeping) the DSL consistent and complete
We are often adding operations that are incomplete with respect to the full DSL we could implement them for, usually because we are implementing with a particular script or demo in mind. If we truly want the eDSL to be a "proper" DSL similar in flexibility to other ML frameworks, this constitutes a significant amount of technical debt that is blocking us.
In the case of #713, we could have implemented mux
on host placements for the other numeric dtypes, but there are many other examples of this elsewhere:
-
expand_dims
,transpose
,reshape
missing some placement implementations -
identity
andabs
op missing kernels or edsl implementations - logical dialect only supports tensors with floating-point and fixed-point dtypes, when it could, and in some cases should, support integer types
- some kernels having implementations that are just
unimplemented!
blocks (recently saw this with IdentityOp having a method calledmissing_kernel
)
The most common cause for these inconsistencies/incompletions is that we are almost always adding operations with particular scripts or demos in mind. But by developing this way, we are adding more technical debt with every new operation. So this is not just an implementation ticket, but also an engineering process ticket. I propose a three-step solution:
- Create an implementation ticket for completing the eDSL. The goal of that ticket will be that all edsl operations can be compiled & executed when used with any dtype/vtype and placement type that make sense given the semantics of the operation and given the constraints of the existing replicated/additive protocols themselves (i.e. is it practical to implement the op with current protocols without doing extensive discovery/research).
- Document any missing/future combinations, or any combinations that don't make sense given the op semantics. For now this could just be done by adding relevant assertions and type checks in the op's edsl function (see comments in #713 's
edsl/base.py
for an example). Eventually we will want to bubble this up to API docs, and theedsl/base.py
file could be one good candidate to use as reference for those docs. - Make this a part of our everyday code review process for PRs that are adding new operations. If a PR adds an operation that is missing implementations for combinations of types/placements that are viable, we either request changes to the PR or insist on a follow-up issue being made to track progress for the incomplete implementations. I would also insist on these issues being picked up directly after the original PR is merged, but n cases where that's not possible at least the ticket will be there to return to for someone else to pick up.
The major discussion topic that should be resolved here before we execute on this plan is to figure out whether this is an important-enough concern that it should be added to the existing clean-up milestone. Once that's been decided and any relevant follow-up issues have been created, we can close this.