papernotes icon indicating copy to clipboard operation
papernotes copied to clipboard

Neural Arithmetic Logic Units

Open howardyclo opened this issue 6 years ago • 1 comments
trafficstars

Metadata

  • Authors: Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom
  • Organization: DeepMind
  • Conference: NIPS 2018
  • Paper: https://arxiv.org/pdf/1808.00508.pdf
  • Code: https://github.com/iamtrask/NALU-2

howardyclo avatar Apr 23 '19 04:04 howardyclo

TL;DR

Present a simple module capable of learning arithmetic functions such as add, sub, mult, div, etc. And can generalize well on unseen data or unseen inference scheme.

DNNs with Non-linearities Struggle to Learn Identity Function

  • Train an autoencoder to reconstruct its input ranged [-5, 5].
  • All autoencoders are identical in its parameterization (3 hidden layers of size 8), only using different non linearities.
  • Trained with MSE loss.
  • Tested in [-20, 20], the error increase severely both below and above the range of numbers seen during training.

The Neural Accumulator (NAC) & Neural Arithmetic Logit Unit (NALU)

  • NAC: A special case of linear layer, whose weight matrix W only consists of {-1, 0, 1}, defined as:
    • W = tanh(\hat{W}) * σ(\hat{M})
    • The elements of W are guaranteed to be [-1, 1], and biased towards {-1, 0, 1} during learning, since {-1, 0, 1} corresponds to the saturation points of either tanh(.) or σ(.)
    • Its output are additions or subtractions of rows in the input vector.
  • NALU: Learns a weighted sum between two sub-cells:
    • One is the original NAC, capable of learning to add and subtract.
    • The other one operates in log space, capable of multiply and divid, e.g., log(XY) = logX + logY; log(X/Y) = logX - log Y; exp(log(X)) = X
    • Altogether, NALU can learn to perform general arithmetic operations.

Limitations of a single NALU [Ref]

  • Can handle either add/subtract or mult/div operations but not a combination of both.
  • For mult/div operations, it cannot handle negative targets as the mult/div gate output is the result of an exponentiation operation which always yeilds positive results.
  • Power operations are only possible when the exponent is in the range of [0, 1].

Related Work

howardyclo avatar May 04 '19 09:05 howardyclo