Arraymancer Future direction for memory management?

Future direction for memory management?

Open cmacmackin opened this issue 3 years ago • 4 comments

I was hoping to clarify what your long-term goal is for memory management. #150 suggests that you eventually want to add a =sink operator, but that seems to conflict with the idea of using reference semantics. You had toyed with moving away from that (#157) but ultimately decided against it.

Your current approach does seem to conflict with the idiomatic way of doing things in Nim and makes it difficult, verging on meaningless, to use sink/lent annotations with custom containing a Tensor. It would seem to me that the most idiomatic way to handle this in Nim would be to make Tensors use value semantics by default, using Nim's =destroy operator for destruction and =sink operator for optimisation. If a user then needs reference semantics then they can just use a ref Tensor type.

Of course, I fully appreciate that this would be a significant (and breaking) change, so I don't expect you to suddenly make it on account of this feature request. Still, I thought it was worth raising.

Jan 03 '21 20:01 cmacmackin

Related: do you know if/how well Arraymancer works with orc/arc?

Jan 03 '21 20:01 cmacmackin

Related: do you know if/how well Arraymancer works with orc/arc?

According to all my local testing Arraymancer works very well with ARC since #420 / #477. There are nimble test tasks defined here: https://github.com/mratsim/Arraymancer/blob/master/arraymancer.nimble#L174-L178 and further below for release.

These are not run in CI yet, because I didn't want to modify the CI at the same time as #477. I'll open a PR "soon" to replace travis by Github Actions and will include the test suite using ARC.

Jan 04 '21 08:01 Vindaar

Arraymancer will keep reference semantics.

Let's take a training loop as example:

# Learning loop
for epoch in 0..10000:
  for batch_id in 0..<100:

    # minibatch offset in the Tensor
    let offset = batch_id * 32
    let x = x_train[offset ..< offset + 32, _]
    let target = y[offset ..< offset + 32, _]

    # Building the network
    let n1 = relu linear(x, layer_3neurons)
    let n2 = linear(n1, classifier_layer)
    let loss = n2.sigmoid_cross_entropy(target)

    # Compute the gradient (i.e. contribution of each parameter to the loss)
    loss.backprop()

    # Correct the weights now that we have the gradient information
    optim.update()

Copy-on-write

has a large tracking overhead, especially when tensors are sliced in a loop like in a machine learning training loop.
Code reorganization can suddenly change the runtime behavior because copies are removed/introduced making dev more wary of introducing changes
Introducing debug prints can suddenly change the runtime behavior
Performance-oriented devs have no escape hatch when copy-on-write isn't wanted.
It cannot be implemented easily on GPUs

Value semantics:

introduce unnecessary copies, burdening the developer with low-level perf consideration. This is also very problematic on GPUs.
can be implemented with custom memory management (Cuda, Vulkan or Laser) but more cumbersome than ref semantics due to needing to overload =copy. Also memory on GPU is scarce, it's harder to debug that any assignment can run out-of-mem than a clone procedure.
departs from familiar usage from Python, R, Julia
are predictable, but require multiple escape hatches
- unsafeView, unsafeReshape, unsafeTranspose, unsafeSlice, unsafeBroadcast, ... for a no-copy version
- that is because you pay on assignation to return values

Reference semantics:

are error-prone for the untrained but Python, Java, Ruby, R, Julia are reference semantics all the way down so it's likely less of a problem.
are predictable, the escape hatch is clone.
- no need of a per proc escape hatch
easy to implement on GPU.
virtually all numerical computing code is written with reference semantics which is porting.
can easily be used with unowned buffers (for example integrating with Numpy).

In terms of implementation complexity.

I kept the value semantics -> copy-on-write branch around (3 years old): https://github.com/mratsim/Arraymancer/compare/copy-on-write
The value semantics -> ref semantics PR: https://github.com/mratsim/Arraymancer/pull/161/files

Jan 05 '21 13:01 mratsim

Obviously this is your project so I'm not demanding you change it on my account or anything. However, I'm still not clear on what is the (performance) value of reference semantics as opposed to using value semantics and then let a user choose a ref Tensor[float] type if they need to avoid making copies. Or would this require making two versions of every routine? If so, I'd have thought a custom pragma might be able to automate that. Or perhaps it could even be achieved with generics and an overloaded newReturnTensor procedure.

Furthermore, wouldn't it be possible to have routines like transpose return a tensor with reference semantics (i.e, a pointer to the same underlying memory as the input) but some flag in the metadata which indicates to the overloaded =sink operator to perform a copy if it ever gets assigned to a variable? I haven't fully thought this through so I could easily be missing something, of course. And none of this addresses the issues you raised about GPU implementations (about which I know very little).

Regardless of that choice, how do you envision Arraymancer working with Nim's new destructors and sink/lent annotations? I suppose there isn't really that much benefit to sink/lent them if the array data (other than metadata) is stored with reference semantics anyway. Would the new destructors have any use or do you use manual reference counting?

Jan 05 '21 17:01 cmacmackin

Arraymancer Arraymancer copied to clipboard

Future direction for memory management?

Arraymancer
Arraymancer copied to clipboard