blocks
blocks copied to clipboard
Initialization discussion
Hi all,
First, well done for your hard work! I have just read the tutorial and I have a few remarks/questions. They may sound critical but don't get me wrong, I'm sure that your design choices are well thought out and I just want to understand them.
- Why do we need to call explicitly brick.initialize? Can't the initialization be done when apply is called? Or during the allocation?
- When is lazy initialization (isn't the term "lazy configuration" more suitable?) necessary/convenient? The example given in the tutorial (connecting two networks with a linear layer) could be done as easily without the lazy initialization. If it is not necessary/convenient, it may be dangerous to allow too much flexibility especially if Blocks wants to stay quite low-level. Having different ways to do the same thing can be quite confusing for a user and tedious for us to develop (the smaller the library the better).
- "Many neural network models, especially more complex ones, can be considered hierarchical structures. Even a simple multi-layer perceptron consists of layers, which in turn consist of a linear transformation followed by a non-linear transformation. As such, bricks can have children. " The terminology (children, parent) might be slightly confusing. The hierarchy you mention is actually a sequence of bricks in which each brick has one child (the previous brick) and one parent (the next brick). MLP is a superBrick composed of subBricks.
- Why do we need to call mlp.push_initialization_config when we override the initialization schemes of the children? Would it be possible to remove this step?
- "By default, all our parameters are set to 0." It seems dangerous to initialize the parameters to zero when they are allocated. Users might forget to call the initialize method.
I will do a PR with the few typos I have found in the tutorial.