Arraymancer
Arraymancer copied to clipboard
Future direction for memory management?
I was hoping to clarify what your long-term goal is for memory management. #150 suggests that you eventually want to add a =sink
operator, but that seems to conflict with the idea of using reference semantics. You had toyed with moving away from that (#157) but ultimately decided against it.
Your current approach does seem to conflict with the idiomatic way of doing things in Nim and makes it difficult, verging on meaningless, to use sink
/lent
annotations with custom containing a Tensor. It would seem to me that the most idiomatic way to handle this in Nim would be to make Tensors use value semantics by default, using Nim's =destroy
operator for destruction and =sink
operator for optimisation. If a user then needs reference semantics then they can just use a ref Tensor
type.
Of course, I fully appreciate that this would be a significant (and breaking) change, so I don't expect you to suddenly make it on account of this feature request. Still, I thought it was worth raising.
Related: do you know if/how well Arraymancer works with orc/arc?
Related: do you know if/how well Arraymancer works with orc/arc?
According to all my local testing Arraymancer works very well with ARC since #420 / #477. There are nimble test tasks defined here:
https://github.com/mratsim/Arraymancer/blob/master/arraymancer.nimble#L174-L178
and further below for release
.
These are not run in CI yet, because I didn't want to modify the CI at the same time as #477. I'll open a PR "soon" to replace travis by Github Actions and will include the test suite using ARC.
Arraymancer will keep reference semantics.
Let's take a training loop as example:
# Learning loop
for epoch in 0..10000:
for batch_id in 0..<100:
# minibatch offset in the Tensor
let offset = batch_id * 32
let x = x_train[offset ..< offset + 32, _]
let target = y[offset ..< offset + 32, _]
# Building the network
let n1 = relu linear(x, layer_3neurons)
let n2 = linear(n1, classifier_layer)
let loss = n2.sigmoid_cross_entropy(target)
# Compute the gradient (i.e. contribution of each parameter to the loss)
loss.backprop()
# Correct the weights now that we have the gradient information
optim.update()
Copy-on-write
- has a large tracking overhead, especially when tensors are sliced in a loop like in a machine learning training loop.
- Code reorganization can suddenly change the runtime behavior because copies are removed/introduced making dev more wary of introducing changes
- Introducing debug prints can suddenly change the runtime behavior
- Performance-oriented devs have no escape hatch when copy-on-write isn't wanted.
- It cannot be implemented easily on GPUs
Value semantics:
- introduce unnecessary copies, burdening the developer with low-level perf consideration. This is also very problematic on GPUs.
- can be implemented with custom memory management (Cuda, Vulkan or Laser) but more cumbersome than ref semantics due to needing to overload =copy. Also memory on GPU is scarce, it's harder to debug that any assignment can run out-of-mem than a
clone
procedure. - departs from familiar usage from Python, R, Julia
- are predictable, but require multiple escape hatches
-
unsafeView
,unsafeReshape
,unsafeTranspose
,unsafeSlice
,unsafeBroadcast
, ... for a no-copy version - that is because you pay on assignation to return values
-
Reference semantics:
- are error-prone for the untrained but Python, Java, Ruby, R, Julia are reference semantics all the way down so it's likely less of a problem.
- are predictable, the escape hatch is
clone
.- no need of a per proc escape hatch
- easy to implement on GPU.
- virtually all numerical computing code is written with reference semantics which is porting.
- can easily be used with unowned buffers (for example integrating with Numpy).
In terms of implementation complexity.
- I kept the value semantics -> copy-on-write branch around (3 years old): https://github.com/mratsim/Arraymancer/compare/copy-on-write
- The value semantics -> ref semantics PR: https://github.com/mratsim/Arraymancer/pull/161/files
Obviously this is your project so I'm not demanding you change it on my account or anything. However, I'm still not clear on what is the (performance) value of reference semantics as opposed to using value semantics and then let a user choose a ref Tensor[float]
type if they need to avoid making copies. Or would this require making two versions of every routine? If so, I'd have thought a custom pragma might be able to automate that. Or perhaps it could even be achieved with generics and an overloaded newReturnTensor
procedure.
Furthermore, wouldn't it be possible to have routines like transpose
return a tensor with reference semantics (i.e, a pointer to the same underlying memory as the input) but some flag in the metadata which indicates to the overloaded =sink
operator to perform a copy if it ever gets assigned to a variable? I haven't fully thought this through so I could easily be missing something, of course. And none of this addresses the issues you raised about GPU implementations (about which I know very little).
Regardless of that choice, how do you envision Arraymancer working with Nim's new destructors and sink/lent annotations? I suppose there isn't really that much benefit to sink/lent them if the array data (other than metadata) is stored with reference semantics anyway. Would the new destructors have any use or do you use manual reference counting?