Arraymancer
Arraymancer copied to clipboard
[NN DSL - Breaking] the forward proc declaration is too restrictive on input types
Taking the definition from example 5:
network ctx, TheGreatSequencer:
layers:
# Note input_shape will only require the number of features in the future
# Input shape = [seq_len, Batch, features]
gru1: GRU([3, Batch_size, 1], HiddenSize, 4) # (input_shape, hidden_size, stacked_layers)
fc1: Linear(HiddenSize, 32) # 1 classifier per GRU layer
fc2: Linear(HiddenSize, 32)
fc3: Linear(HiddenSize, 32)
fc4: Linear(HiddenSize, 32)
classifier: Linear(32 * 4, 3) # Stacking a classifier which learns from the other 4
forward x, hidden0:
let
(output, hiddenN) = gru1(x, hidden0)
clf1 = hiddenN[0, _, _].squeeze(0).fc1.relu
clf2 = hiddenN[1, _, _].squeeze(0).fc2.relu
clf3 = hiddenN[2, _, _].squeeze(0).fc3.relu
clf4 = hiddenN[3, _, _].squeeze(0).fc4.relu
# Concat all
# Since concat backprop is not implemented we cheat by stacking
# Then flatten
result = stack(clf1, clf2, clf3, clf4, axis = 2)
result = classifier(result.flatten)
Unfortunately the DSL forces x
and hidden0
to conform to the type of ctx
.
This does not work with embedding layers which requires Tensor[int]
input so the DSL must be changed to accept:
network ctx, TheGreatSequencer:
layers:
# Note input_shape will only require the number of features in the future
# Input shape = [seq_len, Batch, features]
gru1: GRU([3, Batch_size, 1], HiddenSize, 4) # (input_shape, hidden_size, stacked_layers)
fc1: Linear(HiddenSize, 32) # 1 classifier per GRU layer
fc2: Linear(HiddenSize, 32)
fc3: Linear(HiddenSize, 32)
fc4: Linear(HiddenSize, 32)
classifier: Linear(32 * 4, 3) # Stacking a classifier which learns from the other 4
proc forward(x, hidden0: Variable[Tensor[float32]]): Variable[Tensor[float32]] =
let
(output, hiddenN) = gru1(x, hidden0)
clf1 = hiddenN[0, _, _].squeeze(0).fc1.relu
clf2 = hiddenN[1, _, _].squeeze(0).fc2.relu
clf3 = hiddenN[2, _, _].squeeze(0).fc3.relu
clf4 = hiddenN[3, _, _].squeeze(0).fc4.relu
# Concat all
# Since concat backprop is not implemented we cheat by stacking
# Then flatten
result = stack(clf1, clf2, clf3, clf4, axis = 2)
result = classifier(result.flatten)
so that we can use any input types in the future.
Ergonomics
The Variable
input is usually not needed. Instead we can create overloads of each NN layers that wraps input tensors in Variable
with requires_grad = false
.
i.e. having:
proc linear*[TT](input, weight: Variable[TT], bias: Variable[TT] = nil): Variable[TT] =
...
and
proc linear*[TT](input: TT, weight: Variable[TT], bias: Variable[TT] = nil): Variable[TT] =
...