flambeau
flambeau copied to clipboard
defModule macro to define neural nets without inline C++
Currently we need to inline C++ code due to Nim not being able to generate C++ code with default value or wrap types without default constructors https://github.com/nim-lang/Nim/issues/4687
This leads to the following code
https://github.com/SciNim/flambeau/blob/fe82c68408f30f497ceeac9135b0163249f0fb7f/proof_of_concepts/poc09_end_to_end.nim#L12-L26
with emittypes a workaround https://github.com/nim-lang/Nim/issues/16664 https://github.com/SciNim/flambeau/blob/fe82c68408f30f497ceeac9135b0163249f0fb7f/flambeau/cpp/emitters.nim#L3-L13
Instead it would be more convenient to have a macro defModule
(name subject to bikeshedding) with the following syntax:
defModule:
type Net = object of Module
fc1: Linear = nil
fc2: Linear = nil
fc3: Linear = nil
and generates the proper C++ code (nil mapped to {nullptr}) and rewrite the typesection with the proper {.pure, importcpp.} pragma.
This is similar to https://github.com/nim-lang/RFCs/issues/126, https://github.com/nim-lang/RFCs/issues/252
Implementation details:
- To interpolate with the proper symbol, use the syntax
{.emit:["type something {", NimTypeInterpolated, " field{nullptr};"].}
- With interpolation we should properly support both C++ types without extracting from their importcpp pragma and Nim types without fiddling with gensym.
Thus far I've got an untyped macro that can generate the emitted C++ code and add the correct pragmas. Would it make sense to also generate a init
proc as well like the one here: https://github.com/SciNim/flambeau/blob/cba9cc541d6af8d4987e98e794d12ffb51629fdb/examples/pytorch/cpp/mnist/mnist.nim#L24-L47
We could use your syntax from above (if we assume all types are initialized with nullptr
) and write it like this:
defModule:
type Net = object of Module
conv1: Conv2d = (1, 10, 5)
conv2: Conv2d = (10, 20, 5)
conv2_drop: Dropout2d # no value means passing no parameters
fc1: Linear = (320, 50)
fc2: Linear = (50, 10)
Then the values would be passed to the type's init
proc. We could make it optional to generate it as well so people could choose. But this does of course build upon the fact that we are using an untyped macro as the syntax isn't legal Nim code.
I'm not too used to Torch yet so I'm not really sure whether this is an approach that would work in general for a typical Net, so any feedback is appreciated.
I think something like this is better:
defModule:
type Net = object of Module
conv1 = Conv2d(1, 10, 5)
conv2 = Conv2d(10, 20, 5)
conv2_drop: Dropout2d # no value means passing no parameters
fc1 = Linear(320, 50)
fc2 = Linear(50, 10)
It avoids floating untyped parenthesis and mimics object construction.
The tricky parts are:
- shared weights which can have a dedicated syntax
- nested nets, for example initializing a
Net2
fromNet
That does indeed look more intuitive. ð And with the power of pattern matching not that much harder either ð
- Yeah, I'll keep in mind! I'll try my best to make the code extendable for future special syntaxes.
- That one sounds a bit trickier ðĪ Are you thinking about transfer learning and using the weights from a pretrained network or something like that? Then I could see the problems if one wants to modify it by replacing the last layers or something.
And another thing that I think got lost yesterday, should we support Nim types in a Net? I don't really see how we could include them in the emitted C++ types so I'd say: no ð Plus we have no way of telling if it is a Nim type or C++ type in a untyped network.
Think I realized what you meant by nested nets now, something along these lines, right?
defModule:
type
Net = object of Module
conv1 = Conv2d(1, 10, 5)
conv2 = Conv2d(10, 20, 5)
conv2_drop: Dropout2d # no value means passing no parameters
fc1 = Linear(320, 50)
fc2 = Linear(50, 10)
Net2 = object of Module
net1: Net1
fc = Linear(100, 100)
If we go for an untyped macro we'd need a separate syntax for nested nets. Perhaps we could do it like this, that custom Nets are specified in "Nim-style" (net1: Net1
) and built-in layers are defined using our new "Construction-style" (fc = Linear(100, 100)
):
Net2 = object of Module
net1: Net1
fc = Linear(100, 100)
This would be easy to parse at least. But the again, it depends on what types we should allow in a Net.
Why not just do:
defModule:
type
Net = object of Module
conv1 = Conv2d(1, 10, 5)
conv2 = Conv2d(10, 20, 5)
conv2_drop: Dropout2d # no value means passing no parameters
fc1 = Linear(320, 50)
fc2 = Linear(50, 10)
Net2 = object of Module
net1 = Net()
fc = Linear(100, 100)
in those cases?
@Vindaar Because we need to be able to differentiate between them when generating the emitted C++. The code for Net2
should be:
struct Net2: public torch::nn::Module {
Net net1{nullptr}; // here's the difference, no namespace
torch::nn::Linear fc{nullptr};
};
But the generated code will be this if we can't see a difference between custom Nets (that we generate the C++ code for) and the built-in ones:
struct Net2: public torch::nn::Module {
torch::nn::Net net1{nullptr}; // here's the difference, wrong namespace
torch::nn::Linear fc{nullptr};
};
Built-in layers like conv2_drop: Dropout2d
could also miss initial parameters and could (and perhaps should) be specified as conv2_drop = Dropout2d()
. Perhaps that clashes with what you suggested?
Ah, but that can be easily solved by keeping track of all nets that were user generated (e.g. Net
and Net2
in this example) and only prefixing the torch:nn
prefix for those, which are not user defined, no?
Yes, that should work but then we're back at how to do that safely between macros. As people could use multiple defModule
calls and use types defined in one in another. And thus we would need a shared CT table/set. (macrocache
doesn't have support for checking if a key is in a table or not btw)
edit: This approach also makes the C++ generation more complex as it must first parse all the code, and then create all of the C++ structs. The current approach just creates the C++ code while parsing which is more simple. With different syntaxes that could still be the case, but that's just me being lazy ðĪĢ
Oh ok, now I'm starting to see what you're talking about.
A shared CT table is something that should be supported anyways, imo. So it's only a matter of time until that is / should be implemented. Maybe one of us can help out improving the macrocache
module for the purpose?
With the fully untyped, no state approach, another option might be something like wrapping the name in a call, e.g.:
defModule:
type
Net = object of Module
conv1 = Conv2d(1, 10, 5)
conv2 = Conv2d(10, 20, 5)
conv2_drop: Dropout2d # no value means passing no parameters
fc1 = Linear(320, 50)
fc2 = Linear(50, 10)
Net2 = object of Module
net1 = custom(Net())
fc = Linear(100, 100)
e.g. custom
to indicate this is not a builtin torch type. Bikeshed on the name. custom
, local
, ...
Leaves the syntax alive, is easy to extract, can be documented well.
Yeah, it definitely should be supported, and we probably should help improve it. But it feels like it's quite convoluted to handle everything in an IC-safe way without digging our fingers into how the internals of IC works ðĪ
That approach is more feasible and future-proof in my opinion ð And you have a good point in that it is easier to document and understand than different syntaxes. And the hardest part is of course finding a good name for it ðĪĢ but custom
is a good name that may very well stay. And it looks even more like a syntactic thing with net1 = custom Net()
(that reminds me I'll have to handle two cases:P)
Mnist now works without having to manually emit the C++ type and the init
proc is generated automatically as well. I'm not really sure how to handle the nested Nets though, and specifically how to emit the C++ code for them. I thought I could just do:
struct Net: public torch::nn::Module {
torch::nn::Conv2d conv1{nullptr};
torch::nn::Conv2d conv2{nullptr};
torch::nn::Dropout2d conv2_drop{nullptr};
torch::nn::Linear fc1{nullptr};
torch::nn::Linear fc2{nullptr};
};
struct Net2: public torch::nn::Module {
torch::nn::Linear fc1{nullptr};
Net net1{nullptr}; // error here
};
But then I get this error:
error C2664: 'Net::Net(const Net &)': cannot convert argument 1 from 'nullptr' to 'const Net &'
note: Reason: cannot convert from 'nullptr' to 'const Net'
note: No constructor could take the source type, or constructor overload resolution was ambiguous
It feels like Net
isn't a shared_ptr really as it should be? Any ideas on how to approach it? Must we define it as a pointer ourselves or does the inheritance handle that for us?
Great work!
About the issue, maybe this doesn't make any sense, but:
Maybe the issue is simply that Net
does not define a constructor that takes a single pointer? So that the solution would be to add a simple constructor line for each custom module we create? I tried checking the source of what kind of constructors those already implemented classes provide, but when I saw the TORCH_MODULE
macro here:
https://pytorch.org/cppdocs/api/program_listing_file_torch_csrc_api_include_torch_nn_modules_linear.h.html
I kinda gave up.
To be honest, I don't even know exactly what providing a value within a struct
definition does. I suppose a default value?
Thanks ð
That sound very reasonable, the built-in layers are shared_ptrs so making nullptr default for them makes sense but not for us. So the question is what to put in there instead. I had a look here https://pytorch.org/tutorials/advanced/cpp_frontend.html and they seem to define a struct first and then run the TORCH_MODULE
to create a pointer-y type of it somehow. But I'm not sure if we want that? We would have to change all net.param
to net->param
, right? Plus they have the initialization in the struct body itself which sucks for us. I'm going to think about this in the sauna for a moment now ðĪ
Something as easy as removing the default value altogether seems to have stopped the compiler from complaining at least: Net net1;
. Sometimes it's so easy ðĪĢ Don't know if it works though must test a nested net if it runs, but that's for tomorrow ðĪ
Something as easy as removing the default value altogether seems to have stopped the compiler from complaining at least:
Net net1;
. Sometimes it's so easy Don't know if it works though must test a nested net if it runs, but that's for tomorrow
Indeed, that does make sense. Now you're just saying "this struct contains a member Net
", same as when defining a Nim object with a field Net
.
The question is should we eventually support those default values? Have to read your link up there in more detail, but it does seem like TORCH_MODULE
mainly turns a type T
into a fancy shared_ptr<T>
wrapped in an object.
Yeah exactly ð
Idk tbh. We are kind of handling the default values ourselfs in our generated init
proc so it seems unnecessary to try and make the generated C++ code even more complex?
Yes that's my understanding of it as well. And for the builtin layers it doesn't matter because we don't access their members, just a few select functions that we have wrapped. But if we want to access the fields in a Nim Net we want to be able to access the fields and that's harder if we wrap it in a shared_ptr I think. But maybe we could fix that somehow as well, have to think a bit more on it
Personally I would probably try to see how far your current approach can work. It's "only" a macro and we can always rewrite parts of it. :)
Agreed! Both because it is simpler and because I don't want to start over ðĪŠ
Tested it and initializing nested Nets seems to work (tomorrow is another day, namely today :P)
Something that doesn't work is defining custom Nets in one file and importing it into another though. And that makes sense as the emitted C++ in the original file isn't visible to the file that imports the types. I guess this is a issue for us? A hack I found to "solve" it is to define the types in a template and importing and running the template. This way all emitted types are visible. Here an example:
# nets.nim
template importNets*: untyped {.dirty.} =
defModule:
type
Net* = object of Module
conv1* = Conv2d(1, 10, 5)
conv2* = Conv2d(10, 20, 5)
conv2_drop* = Dropout2d()
fc1* = Linear(320, 50)
fc2* = Linear(50, 10)
Net2* = object of Module
net1* = custom Net()
fc1* = Linear(4, 1)
# training.nim
import ./nets
importNets()
var model: Net
model.init()
I don't know how we could solve it otherwise unless we can get the emitted C++ code visible across files. It's the C++ compiler that complains in the end that Net is undeclared
.
Have done some further testing with nesting now and it doesn't seem to work unless we register the subnet and it seems to expect a shared_ptr to a net. So we may have to use TORCH_MODULE after all :/
The only thing that would be cumbersome is that we can't access the members with dot notation and instead must use arrows. Is that something that's acceptable? Otherwise, we could try writing a dot macro and rewrite all unresolved dot operations to arrows https://nim-lang.github.io/Nim/manual_experimental.html#special-operators-dot-operators
So something like net.conv1
becomes net->conv1
.
Any better suggestions now that we have to use the pointer version of the nets?
Edit: realized that the dot macros won't work because the Nim compiler thinks it can access the fields so the macro won't be invoked. :/
I'm not sure what you mean by we have to use ->
. Do you mean within the emit
statements of the macro?
No, I mean when we want to access the fields of our Nets (in the forward
proc mostly). If our Net is a shared_ptr
(it's a wrapper around a shared pointer in reality) then we can't access the fields with net.conv1.forward(x)
in C++ but instead must use net->conv1.forward
. But in the C++ code that the compiler generates it uses a dot instead of an arrow. So it has nothing to do with the emit
. Just what code the compiler generates.
Sorry for spamming, just want to get my thoughts down in writing while they are on the top of my head.
Here's how it works in C++:
struct LinearImpl : torch::nn::Module {
// define the members here
};
TORCH_MODULE(Linear);
LinearImpl
is an ordinary Module
with value semantics. Linear
on the other hand is a nn:ModuleHolder
which is a wrapper around shared_ptr<LinearImpl>
. And it's this that seems to be the recommended way of doing things in LibTorch. The question then is how to wrap this in Nim.
We can't really wrap the LinearImpl
directly because we won't get a shared pointer then. And if we wrap Linear
we get the pointer but now we can't access the fields (according to the Nim compiler which only sees that it is an "empty" imported type). So my guess would be that we need to wrap the Linear
type but somehow provide the fields of the struct to the Nim compiler and make sure it outputs C++ code that uses ->
instead of .
for field access.
A naive idea on how to do this would be to generate procs in the macro for each and every field that is defined using importcpp: "#->`fieldName`"
like this:
type Net {.importcpp.} = object of ModuleHolder
# it has fields conv1, conv2, fc1
# generate the procs for accessing them:
proc conv1*(net: var Net): ModuleHolder {.importcpp: "#->conv1".}
proc conv2*(net: var Net): ModuleHolder {.importcpp: "#->conv2".}
proc fc1*(net: var Net): ModuleHolder {.importcpp: "#->fc1".}
# Now it can be used as an ordinary object:
var net: Net
net.init()
let y = net.conv1.forward(x)
Another approach would be to wrap LinearImpl
in a shared_ptr<LinearImpl>
ourselves and then using a dot macro (Mamy has already implemented one for methods I think, so doing one for field access should be doable). This way the Nim compiler could know of the fields' existence without us having to tell it about it. So in this case the code would be something like:
type Net = CppSharedPtr[NetImpl]
var net: Net
net.init()
let y = net.conv1.forward(x) # `.` macro handles it so we generate `->` instead of `.`
I haven't implemented any of them yet though so no idea if it could work.
edit: Not sure if it was clear but the reason we need the module as a pointer is that register_module
, which is used to register submodules (nested nets), requires the model to be a ModuleHolder or a shared_ptr.
Here comes a bit more rambling from me. I've thought a bit more about this and I think doing it the torch::nn::ModuleHolder
way is the way to go. And the strongest reason is that we'd have to implement it either way because the built-in layers like Linear
and Conv2d
are ModuleHolder
s and not Module
s. And so my thinking is that we should do it the same way that Libtorch is doing it, something like this:
type
Module* {.bycopy, pure, inheritable, importcpp: "torch::nn::Module".} = object
## A LibTorch neural network module that can be inherited from
ModuleHolder*[T: Module] {.bycopy, pure, importcpp: "torch::nn:ModuleHolder<'0>".} = object
## A Libtorch wrapper around std::shared_ptr<Module>
type
LinearImpl* {.pure, bycopy, importcpp: "torch::nn::LinearImpl".} = object of Module
options*{.importc.}: LinearOptions
weight*{.importc.}: Tensor
bias*{.importc.}: Tensor
Linear* {.pure, bycopy, importcpp: "torch::nn::Linear".} = ModuleHolder[LinearImpl]
# Linear is a shared_ptr underneath.
# The ptr is bycopy which results in the actual data being byref.
Conv2dImpl* {.pure, bycopy, importcpp: "torch::nn::Conv2dImpl".} = object of Module
options*{.importc.}: Conv2DOptions
weight*{.importc.}: Tensor
bias*{.importc.}: Tensor
Conv2d* {.pure, bycopy, importcpp: "torch::nn::Conv2d".} = ModuleHolder[Conv2dImpl]
# Conv2d is a shared_ptr underneath.
# The ptr is bycopy which results in the actual data being byref.
Then we just define a dot macro to handle the dereferencing of a ModuleHolder[Module]
to Module
and then everything should hopefully work.
Of course, Nim doesn't like it when we use inherited types as generic types ðĪĢ:P Have to find another way to do this it seems.
Heureka! Think I've got it to work finally ð So I'm leaning more and more towards using CppSharedPtr
now because we can get the underlying type then. Hopefully, nothing breaks until I'm back from a break now ðą
@mratsim Any reason why you went with
var net: Net
net.init()
# instead of
var net: Net = Net.init()
The second way is how the builtin layers are initialized so for consistency we could use it for custom Nets as well?
Ok, now I've got something that works actually works for nested nets as well. Added support for parameters (Tensors) as well. So my final proposal for the syntax is this:
defModule:
type
Net* = object of Module
conv1* = Conv2d(1, 10, 5)
conv2* = Conv2d(10, 20, 5)
conv2_drop* = Dropout2d()
fc1* = Linear(320, 50)
fc2* = Linear(50, 10)
Net2 = object of Module
anotherNet = custom Net
w = param randn(100, 100)
bias = param randn(100)
var net = Net.init()
var net2 = Net2.init()
The only thing that feels redundant is object of Module
mainly because it isn't entirely correct. Net
is a CppSharedPointer[Module]
after all. The only reason I could see for leaving it be would be if we want to allow the user to define non-net types in defModule
as well but I don't know.
Now you are free from my spamming here for now ðĪĢ
@mratsim Any reason why you went with
var net: Net net.init() # instead of var net: Net = Net.init()
The second way is how the builtin layers are initialized so for consistency we could use it for custom Nets as well?
Because some layers might not be copyable and so I want to avoid assignments.
@mratsim Okay ð But if we wrap all of them in shared_ptr
we won't have to deal with that, right? And from what I know, most, if not all, built-in layers seem to be ModuleHolder
s and thus are fancy shared_ptr
s and have reference semantics. Do you know of any case that I haven't mentioned here? ð