ElasticArrays.jl icon indicating copy to clipboard operation
ElasticArrays.jl copied to clipboard

Restriction to the last dimension

Open ChrisRackauckas opened this issue 8 years ago • 27 comments

Why the restriction to the last dimension? Is this due to the design or could that be changed?

ChrisRackauckas avatar Oct 10 '17 15:10 ChrisRackauckas

It's currently due to design - ElasticArray should be contiguous and fast (basically native array speed, except for a slightly more expensive size-function). The internal vector reference, kernel size, etc. are immutable.

ElasticArray is mainly intended for storing/streaming incoming chunks of multi-dimensional data, etc.. I also want to use it as a back-end for a vector-of-arrays wrapper that I'm writing (see branch "arrayvector") and a few other things. So usually, limiting resize to the last dimension is not a problem. But if it can be done without loss of performance (and I have to admit that I haven't done extensive performance tests yet), multi-dimensional resize would be even better, of course.

oschulz avatar Oct 10 '17 17:10 oschulz

Oh, I forgot: One important design criterion is that an ElasticArray should be able to "breathe" (grow,shrink,grow,...) without causing a high memory allocation and GC load, to support multithreaded high-throughput applications. That's why I'm using a Vector and it's native GC-friendly resize capabilities (space stays reserved after shrink) internally.

oschulz avatar Oct 10 '17 17:10 oschulz

I did a few tests and it looks like ElasticArray might actually be faster (at least sometimes) if it's a mutable struct (expert advice would be very welcome). If ElasticArray were mutable, then it could, in principle, be resizeable in all dimensions.

But changing the size of any dimension but the last would be very expensive, as it would require memory allocation and data movement. @ChrisRackauckas, do you have any specific use case(s) in mind?

oschulz avatar Oct 11 '17 10:10 oschulz

@ChrisRackauckas, do you have any specific use case(s) in mind?

I want to change the size of Jacobian matrices but attempt to re-use some of the same memory / not re-allocate the whole thing. I am not sure if it's possible.

ChrisRackauckas avatar Oct 11 '17 13:10 ChrisRackauckas

You could try manually with

julia> buf = Vector{Int}(10)
10-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

julia> a = Base.ReshapedArray(buf, (5, 2), ())
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 0  0
 0  0
 0  0
 0  0

julia> push!(buf, 3)
11-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 3

julia> a
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 0  0
 0  0
 0  0
 0  0

julia> buf[2] = 7
7

julia> a
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 7  0
 0  0
 0  0
 0  0

You have to create a new wrapper every time you want to resize. But this approach lets you live dangerously and circumvent the "cannot resize array with shared data" error.

The best reason to try it this way and see if it makes a difference is that you might save yourself some time: I'd be a little surprised if this helps that much, unless you're talking very small arrays. These days Julia's GC is pretty impressive.

timholy avatar Oct 11 '17 14:10 timholy

Although I just realized I pulled such tricks in TiledIteration, so maybe...

timholy avatar Oct 11 '17 14:10 timholy

I want to change the size of Jacobian matrices but attempt to re-use some of the same memory / not re-allocate the whole thing. I am not sure if it's possible.

It might be - if it turns out that ElasticArray won't do any worse as a mutable struct, it would be possible to implement it.

Implementing the necessary data movement by hand would be a bit of work, though we could just do it via temorary views for now and live with the alloc/GC views (I wish we had stack-allocated views already ...). And it might not cost anything in certain use cases where the array contents doesn't need to be conserved during resize - I guess that's that case for you, Chris? In such cases, one could resize to size (0,0,...) and then blow the array up again to the desired new size - would result in no data movement. And memory allocation would only occur sometimes if the new total length is greater than ever before in the array's history. Would that fit your use case, Chris?

I'm still not sure about the mutable/immutable struct question, though.

oschulz avatar Oct 11 '17 19:10 oschulz

@timholy: These days Julia's GC is pretty impressive.

GC can still quickly become the limiting factor when running on many threads, though, in my experience.

oschulz avatar Oct 11 '17 19:10 oschulz

One way to make any dimension elastic, without sacrificing much efficiency, would simply be to allocate padding in each dimension (growing geometrically as needed whenever data is appended). This would mean that the data is no longer contiguous in memory, but:

  • Access is still the same fast column-major operation, just with different strides.
  • Resulting matrices can still be used with LAPACK, because LAPACK supports padding in the matrix columns.
  • Extending any given dimension is a fast operation because of the geometric padding growth — just like how resizing a 1d array is implemented in Julia, growing by N in any dimension would require only O(N) work. Most of the time, extending a given dimension would require no reallocation or data movement.

(Access is exactly equivalent to a SubArray view of a larger allocated array.)

That's why I'm using a Vector and it's native GC-friendly resize capabilities (space stays reserved after shrink) internally.

There's nothing about this growth/shrink algorithm that you can't implement natively in Julia for multiple dimensions — it is just allocating padding and accessing a view.

stevengj avatar Jan 21 '20 17:01 stevengj

Hmhm - that would require up-front knowledge of the maximum size, right?

oschulz avatar Jan 22 '20 15:01 oschulz

Hmhm - that would require up-front knowledge of the maximum size, right?

No. You do exactly the same thing as push(vector, x) does — when you run out of space, allocate a new array with double the space in that dimension and copy the old data over.

(Because there are only a logarithmic number of re-allocations, you get O(1) amortized time to increase any dimension by 1.)

stevengj avatar Jan 22 '20 18:01 stevengj

Ah, yes - but that would indeed require kernel_size to be mutable (currently, ElasticArray is immutable). I'm not sure if making ElasticArray mutable would be good or bad for performance.

There's on other problem though: Resized can then invalidate existing views into the array - from what I understand, that's one of the reasons why standard multi-dim Arrays are not resizeable. @ChrisRackauckas, do you know any details about that? Could Array become resizeable in it's last dimension, in principle?

Maybe we should add a separate type, like MultiElasticArray or so, which does what @stevengj suggests and gives fewer guarantees on views in return?

oschulz avatar Jan 22 '20 19:01 oschulz

A project I'm working on requires me to iteratively update a matrix X'X as I tack on columns to X in batches. I'm therefore expanding both rows and columns of X'X and this kind of functionality is necessary. I'm using the following barebones implementation, which is along the lines of what has been suggested here.

import Base: *

function _elastic_array(
	public::V, 
	private_size::NTuple{N,Int}, 
	W::Type=typeof(public),
) where V<:AbstractArray{T,N} where {T,N}
	private = W(undef, private_size)
	public_idx = [1:d for d in size(public)]
	private[public_idx...] = public
	public = @view private[public_idx...]
	return public::SubArray{T,N,W}, private::W
end

mutable struct ElasticArray{T, N, V<:DenseArray{T,N}} <: DenseArray{T,N}
	public::SubArray{T,N,V}
	private::V

	function ElasticArray{T,N,V}(public) where {T,N,V}
		new(_elastic_array(public, 2 .* size(public))...)
	end
end

ElasticArray(public) = ElasticArray{eltype(public), ndims(public), typeof(public)}(public)

@inline Base.size(A::ElasticArray) = size(A.public)
@inline Base.getindex(A::ElasticArray, i::Int) = getindex(A.public, i)
@inline Base.setindex!(A::ElasticArray, v, i::Int) = setindex!(A.private, v, i)
@inline Base.pointer(A::ElasticArray, i::Integer) = pointer(A.public, i)
@inline Base.unsafe_convert(::Type{Ptr{T}}, A::ElasticArray{T}) where T = Base.unsafe_convert(Ptr{T}, A.public)
Base.convert(::Type{T}, A::AbstractArray) where {T<:ElasticArray} = A isa T ? A : T(A)
*(A::Array{T}, B::ElasticArray) where T = A*B.public
*(B::ElasticArray, A::Array{T}) where T = B.public*A
Base.show(io, A::ElasticArray) = Base.show(A.public)

function size_or_0(A::V, dims)::NTuple{N,Int} where V<:AbstractArray{T,N} where {T,N}
	Tuple((i in dims) ? d : 0 for (i,d) in enumerate(size(A)))
end

function cat!(A::ElasticArray{T,N,V}, a::V, dims) where {T,N,V}
	idxs_start = 1 .+ size_or_0(A, dims)
	idxs_end = size(A) .+ size_or_0(a, dims)

	if any(idxs_end .≥ size(A.private))
		expand_private!(A, 2 .* idxs_end)
	end

	A.private[(i:j for (i,j) in zip(idxs_start, idxs_end))...] = a
	A.public = @view A.private[(1:j for j in idxs_end)...]
	return A
end

function expand_private!(A::ElasticArray, private_size)
	A.public, A.private = _elastic_array(A.public, private_size, typeof(A.private))
	return A
end

Hopefully it's useful to you all. Naturally, any other views into A.private that exist when the array gets resized end up pointing to old data I suppose. Only way around that would be for ElasticArray to keep track of any of its views and repoint them to the resized A.private whenever that array changes. Probably more work than its worth unless there's a compelling use case for having additional views be compatible with ElasticArray.

alejandroschuler avatar May 12 '22 00:05 alejandroschuler

Thanks @alejandroschuler - would you like to try your hand at a PR?

It would be nice if we could share as much code as possible between ElasticArray and ExtraFlexArray (or whatever we want to name it). We could add an AbstractElasticArray supertype to achieve that.

oschulz avatar May 12 '22 06:05 oschulz

I'd like to but I actually only barely know what I'm doing in julia. Re: a supertype: all the shared methods would then be dispatched on that supertype, yeah? In terms of naming, perhaps AbstractElasticArray for the supertype, ElasticArray for my generic implementation, and LastDimElasticArray for the original? Or LastDimElasticArrayViewable to make clear the possible advantage of the latter?

Related question: I noticed that in my implementation matrix multiplication were painfully slow if I didn't manually define *. Presumably this is also the case for other operations. Is there a better way than going through tons and tons of functions and repeatedly defining: f(A::ElasticArray, ...) = f(A.public, ...) for each one of them? Any function f that works for a subarray (A.public) should ideally be defined in this way so that it's optimized when operating on my ElasticArray, unless otherwise defined (though I think the only place we need to operate on A.private is in setindex!). Is there a better way of doing this? In python I'd do some complicated stuff with __getattr__ so that methods that are not defined for A would instead get called on A.public. It'd be disappointing if there weren't a less clunky way to achieve the same effect in julia.

alejandroschuler avatar May 12 '22 18:05 alejandroschuler

I'd like to but I actually only barely know what I'm doing in julia.

We'll have a try if you like. :-) I can put this on my to-do list, but I'm afraid quite a few high-priority things are already overdue, so I probably won't be able to get to this myself soon.

Re: a supertype: all the shared methods would then be dispatched on that supertype, yeah?

Yes

In terms of naming, perhaps AbstractElasticArray for the supertype, ElasticArray for my generic implementation, and LastDimElasticArray for the original?

That would force us to go to v2.0.0, because it would break backward compatibility. We shouldn't change the name of the existing type. I'd be fine with ExtraFlexArray or SuperElasticArray or so for the new type.

I noticed that in my implementation matrix multiplication were painfully slow if I didn't manually define

Linear algebra is fast for the current ElasticArray because it's dense in memory and so can specialize Base.pointer(A). An approach like suggested by @stevengj - reserving space in any growing dimensions in larger chunks - would be efficient regarding resize (O(1) amortized time) but couldn't use optimized BLAS-based linear algebra. Still, the generic linear algebra operations shouldn't be horridly slow (just not reach BLAS-speed).

s there a better way than going through tons and tons of functions and repeatedly defining [...]

Take a look at https://discourse.julialang.org/t/method-forwarding-macro/23355 , but it's not a magic pill.

In the end, if storage can't be compact, I don't think you can reach optimal linear algebra speed. But if you want to extend the first dimension and still keep things compact im memory, vcat on a standard Matrix may be the best choice (and not necessarily inefficient of the chunks are comparatively large.

oschulz avatar May 12 '22 18:05 oschulz

Sounds good. We'll see who gets to it first!

Re: naming: can you tell I'm not a software engineer? haha. Yes, preserving backwards compatibility makes sense.

Thanks a bunch for that link, seems like TypedDelegation is exactly what I need here if I can figure out how to except setindex!.

alejandroschuler avatar May 12 '22 18:05 alejandroschuler

vcat on a standard Matrix may be the best choice (and not necessarily inefficient of the chunks are comparatively large.

in my application it's lots of little chunks so this ended up being a huge bottleneck and that's why I'm here :)

alejandroschuler avatar May 12 '22 18:05 alejandroschuler

Re: naming: can you tell I'm not a software engineer? haha. Yes, preserving backwards compatibility makes sense.

No worries! :-) If you're interested, here's some more info on Julia and semantic versioning: https://pkgdocs.julialang.org/v1/compatibility/#Version-specifier-format

in my application it's lots of little chunks so this ended up being a huge bottleneck and that's why I'm here :)

Ah, yep, then @stevengj's scheme (see above) will be the way to go. And you'll have to do linear algebra operations with it while it grows, not just at the end, right? In that case, there's no way around generic, pure-Julia linear algebra. However, there are quite a few fast packages in that direction now that you can try out (list likely incomplete): https://github.com/mcabbott/Tullio.jl, https://github.com/MasonProtter/Gaius.jl, https://github.com/chriselrod/PaddedMatrices.jl

We can have ElasticArrays depend on any of them, they're far too "heavy", but your application could use them directly the new "super-elastic" arrays.

oschulz avatar May 12 '22 22:05 oschulz

A project I'm working on requires me to iteratively update a matrix X'X as I tack on columns to X in batches.

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

oschulz avatar May 12 '22 22:05 oschulz

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

The naive implementation looks like this:

# n ≈ 1000, k ≈ 10
X = Matrix(undef, n, 0)
for iter in 1:total_iterations
    X_new_batch = generate_new_batch(result) # size (n,k)
    X = hcat(X, X_new_batch)
    P = X'X
    result = heavy_computation(X, P)
end

alejandroschuler avatar May 12 '22 22:05 alejandroschuler

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

The naive implementation looks like this:

    X_new_batch = generate_new_batch(result) # size (n,k)
    X = hcat(X, X_new_batch)
    P = X'X

Would this work for you, performance-wise? Store X as an ElasticArray, then you can replace the hcat with an efficient append! - in that direction an ElasticArray can already grow, after all. But don't run X' * X explicitly - instead, whenever you have to run P * v run X' * (X * v) instead. You can use LinearMaps.jl to do that for you transparently:

using LinearMaps
...
X_ea = ElasticArray(...) # Initial X
X = LinearMap(X_ea)  # X.lmap === X_ea
P = Xm' * X  # O(1) operation, doesn't run any calculation, just creates a new LinearMap
...
append!(Xm.lmap, ...) # As Xm.lmap === X_ea grows, P grows as well
result = heavy_computation(X, P)

P * v with now run X_ea' * (X_ea * v) under the hood. As long as heavy_computation only uses operations like multiplications and so on, this scheme can be very efficient.

oschulz avatar May 12 '22 23:05 oschulz

Thanks @oschulz, but no dice :( heavy_computation asks for P::AbstractMatrix. It's a complex optimization that involves factorizations of P so I don't think the linear maps approach works here. Moreover, even if it did, doing it this way would effectively repeat a lot of computations of upper-left blocks of X'X which is what I'd like to avoid in the first place. It's quite elegant though, so I'm sure this will come in handy for me at some point- thanks anyways!

alejandroschuler avatar May 14 '22 17:05 alejandroschuler

heavy_computation asks for P::AbstractMatrix

Ah, yes, that's a problem when using LinearMaps sometimes, I suggested to make LinearMap a subtype of AbstractMatrix a few days ago: JuliaLinearAlgebra/LinearMaps.jl#180

oschulz avatar May 14 '22 19:05 oschulz

It's a complex optimization that involves factorizations of P so I don't think the linear maps approach works here

Yes, if there's explicit factorization and so on, LinearMaps won't help. But in that case, will an elastic array really safe much memory allocation? After all, these factorization will then be allocated very frequently anyhow and be of similar size, right? Or is all of it in-place?

oschulz avatar May 14 '22 19:05 oschulz

Yes, if there's explicit factorization and so on, LinearMaps won't help. But in that case, will an elastic array really safe much memory allocation? After all, these factorization will then be allocated very frequently anyhow and be of similar size, right? Or is all of it in-place?

Yeah it all has to be reallocated to make this happen, but I think the speedup comes from having to do it once instead of twice. Empirically it's much faster than the hcat implementation.

alejandroschuler avatar May 14 '22 20:05 alejandroschuler

My you do have a tricky use case there! :-)

oschulz avatar May 14 '22 20:05 oschulz