Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

[NN DSL - Breaking] the forward proc declaration is too restrictive on input types

Open mratsim opened this issue 5 years ago • 0 comments

Taking the definition from example 5:

network ctx, TheGreatSequencer:
  layers:
    # Note input_shape will only require the number of features in the future
    # Input shape = [seq_len, Batch, features]
    gru1: GRU([3, Batch_size, 1], HiddenSize, 4) # (input_shape, hidden_size, stacked_layers)
    fc1: Linear(HiddenSize, 32)                  # 1 classifier per GRU layer
    fc2: Linear(HiddenSize, 32)
    fc3: Linear(HiddenSize, 32)
    fc4: Linear(HiddenSize, 32)
    classifier: Linear(32 * 4, 3)                # Stacking a classifier which learns from the other 4
  forward x, hidden0:
    let
      (output, hiddenN) = gru1(x, hidden0)
      clf1 = hiddenN[0, _, _].squeeze(0).fc1.relu
      clf2 = hiddenN[1, _, _].squeeze(0).fc2.relu
      clf3 = hiddenN[2, _, _].squeeze(0).fc3.relu
      clf4 = hiddenN[3, _, _].squeeze(0).fc4.relu

    # Concat all
    # Since concat backprop is not implemented we cheat by stacking
    # Then flatten
    result = stack(clf1, clf2, clf3, clf4, axis = 2)
    result = classifier(result.flatten)

Unfortunately the DSL forces x and hidden0 to conform to the type of ctx.

This does not work with embedding layers which requires Tensor[int] input so the DSL must be changed to accept:

network ctx, TheGreatSequencer:
  layers:
    # Note input_shape will only require the number of features in the future
    # Input shape = [seq_len, Batch, features]
    gru1: GRU([3, Batch_size, 1], HiddenSize, 4) # (input_shape, hidden_size, stacked_layers)
    fc1: Linear(HiddenSize, 32)                  # 1 classifier per GRU layer
    fc2: Linear(HiddenSize, 32)
    fc3: Linear(HiddenSize, 32)
    fc4: Linear(HiddenSize, 32)
    classifier: Linear(32 * 4, 3)                # Stacking a classifier which learns from the other 4
  proc forward(x, hidden0: Variable[Tensor[float32]]): Variable[Tensor[float32]] =
    let
      (output, hiddenN) = gru1(x, hidden0)
      clf1 = hiddenN[0, _, _].squeeze(0).fc1.relu
      clf2 = hiddenN[1, _, _].squeeze(0).fc2.relu
      clf3 = hiddenN[2, _, _].squeeze(0).fc3.relu
      clf4 = hiddenN[3, _, _].squeeze(0).fc4.relu

    # Concat all
    # Since concat backprop is not implemented we cheat by stacking
    # Then flatten
    result = stack(clf1, clf2, clf3, clf4, axis = 2)
    result = classifier(result.flatten)

so that we can use any input types in the future.

Ergonomics

The Variable input is usually not needed. Instead we can create overloads of each NN layers that wraps input tensors in Variable with requires_grad = false.

i.e. having:

proc linear*[TT](input, weight: Variable[TT], bias: Variable[TT] = nil): Variable[TT] =
  ...

and

proc linear*[TT](input: TT, weight: Variable[TT], bias: Variable[TT] = nil): Variable[TT] =
  ...

mratsim avatar Dec 08 '18 11:12 mratsim