blocks icon indicating copy to clipboard operation
blocks copied to clipboard

Initialization discussion

Open adbrebs opened this issue 9 years ago • 10 comments

Hi all,

First, well done for your hard work! I have just read the tutorial and I have a few remarks/questions. They may sound critical but don't get me wrong, I'm sure that your design choices are well thought out and I just want to understand them.

  1. Why do we need to call explicitly brick.initialize? Can't the initialization be done when apply is called? Or during the allocation?
  2. When is lazy initialization (isn't the term "lazy configuration" more suitable?) necessary/convenient? The example given in the tutorial (connecting two networks with a linear layer) could be done as easily without the lazy initialization. If it is not necessary/convenient, it may be dangerous to allow too much flexibility especially if Blocks wants to stay quite low-level. Having different ways to do the same thing can be quite confusing for a user and tedious for us to develop (the smaller the library the better).
  3. "Many neural network models, especially more complex ones, can be considered hierarchical structures. Even a simple multi-layer perceptron consists of layers, which in turn consist of a linear transformation followed by a non-linear transformation. As such, bricks can have children. " The terminology (children, parent) might be slightly confusing. The hierarchy you mention is actually a sequence of bricks in which each brick has one child (the previous brick) and one parent (the next brick). MLP is a superBrick composed of subBricks.
  4. Why do we need to call mlp.push_initialization_config when we override the initialization schemes of the children? Would it be possible to remove this step?
  5. "By default, all our parameters are set to 0." It seems dangerous to initialize the parameters to zero when they are allocated. Users might forget to call the initialize method.

I will do a PR with the few typos I have found in the tutorial.

adbrebs avatar Mar 01 '15 14:03 adbrebs