Invented opcodes for named access to locals/arguments
There are a few invented opcodes in the Opcode type, which might not be the best thing to have if we want to parse CIL because the parser would have to case-analyze what it is parsing to build the AST.
Examples:
data OpCode
= ...
| LdargN DottedName
| LdlocN DottedName
| LdlocaN DottedName
| StlocN DottedName
-- CIL also defines a family of starg opcodes which we don't have yet (there's bit of a philosophical issue with "storing" to your own arguments, but it does exist), for symmetry we would presumably create a StargN DottedName alternative
I propose:
data Location = Offset Offset
| Name DottedName -- or is it LocalName, I'm confused about the necessity of DottedName
data OpCode
= ...
| Ldarg Location
| Ldarga Location -- not implemented yet, but a trivial extension to implement
| Ldloc Location
| Ldloca Location
| Starg Location
| Stloc Location
The ldargN, ldlocN, ldlocaN, stlocN helpers in Build can stick around, or we could play typeclass tricks to let you write:
[ ldc_i4 42
, dup
, stloc 2
, stloc "answer"
]
as:
class IsLocation t where -- kinda unfortunate you can't have both a type named location and a class named location, this may not be the best name
toLocation :: t -> Location
instance IsLocation Offset where
toLocation = Offset
instance IsLocation DottedName where -- (or is it LocalName, or both?)
toLocation = Name
ldloc :: (IsLocation loc) => loc -> MethodDecl
ldloc loc = mdecl $ Ldloc $ toLocation loc
-- and so forth...
I'm not 100% sure the type class hackery is really any better than the one where we keep the ldlocN, ldargN, ... family of helpers. It improves readability of the code, it maybe makes it easier on some beginners (less to remember, guessing ldloc "x" works for those who guess that, guessing ldloc 0 works for those who guess that) but if a beginner makes a mistake the type error message will be more confusing.
I'm not sure I understand the problem.
What to you mean by "parse CIL"? Parsing a binary, or parsing the textual representation of CIL?
In the case of parsing a binary, the parser can always chose the Ldloc version.
A parser for textual CIL, can make the choice based on what's there in the text. The named LdlocN may not be an instruction in the binary serialization of CIL, but it is a feature of ILAsm, and part of de AST for textual CIL.
Yeah, I suppose it isn't that bad. Still, it seems more in line with the rest of things to have one alternative in OpCode (avoids duplicating analysis code) and two builder methods.
Ah, that's a good point about duplicating analysis code. I agree with adding the Location data type, and keeping the two builder functions.
I'm not a fan of type classes for these sort of things. This isn't some very general concept (like Eq), and the type errors are indeed confusing.
Agreed. I prototyped the type class approach to this problem and it doesn't work out well at all. It works sort-of-ok with the GHC overloaded-string-literals extension, but at a cost of worse ugliness. I'm all for scrapping that guess.