RecurrenceAnalysis.jl icon indicating copy to clipboard operation
RecurrenceAnalysis.jl copied to clipboard

Specialized type for Boolean sparse arrays

Open heliosdrm opened this issue 5 years ago • 10 comments

Recurrence matrices are wrappers of SparseMatrixCSC{Bool,Int}, which are useful to make some operations. However it is inefficient to create them (cf. #76), and also take more memory than what is needed (all nonzero values are true, so they don't add any information at all).

It would be useful to have a special type of sparse matrices for these cases, which only store row indices and the last index of each column - but not the values, with optimized methods. This might come in handy even for other applications, not only for recurrence analysis, so in my opinion it is worth to be in a separate package (outside JuliaDynamics).

I feel like investigating if such a package exists, and otherwise draft it. When it is working, we could use it to improve performance in this package.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

heliosdrm avatar Sep 19 '19 13:09 heliosdrm

Indeed this seems useful. But it could also be a lot of effort.

What I think would make sense is to first implement it here, and add the operations necessary for RecurrenceAnalysis to work. Then, if you feel you want to spent more effort to make it generally useful you can do it.

But why do you suggest for it to be outside of JuliaDynamics? It can be a separate package from DynamicalSystems.jl , but what is the specific reason for it to be in another org?

Datseris avatar Sep 19 '19 13:09 Datseris

By the way, this is really funny because I was thinking to do exactly something along these lines next month, where I will be spending some time doing quality PRs all over JuliaDynamics. (Just defended my PhD last Friday!)

Datseris avatar Sep 19 '19 13:09 Datseris

I think it doesn't really matter whether that package is made inside or outside JuliaDynamics. I said that because the concept is not a specific tool for dynamical systems, so drafting it outside also makes sense, and would save the effort of maintenance by JuliaDynamics.

Nevertheless: I agree that we could start by implementing only what RecurrenceAnalysis needs. And then extend in a separate package it if it is worth the effort.

heliosdrm avatar Sep 19 '19 13:09 heliosdrm

By the way, this is really funny because I was thinking to do exactly something along these lines next month, where I will be spending some time doing quality PRs all over JuliaDynamics. (Just defended my PhD last Friday!)

Congratulations!!!

heliosdrm avatar Sep 19 '19 13:09 heliosdrm

Congratulations, George!

Am 19.09.2019 um 15:40 schrieb George Datseris [email protected]:

By the way, this is really funny because I was thinking to do exactly something along these lines next month, where I will be spending some time doing quality PRs all over JuliaDynamics. (Just defended my PhD last Friday!)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaDynamics/RecurrenceAnalysis.jl/issues/79?email_source=notifications&email_token=AAHNNEBJ2UQVDY4P7UY2WOLQKN6MPA5CNFSM4IYLGUD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DQBPY#issuecomment-533135551, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHNNECQTLSNSL6FJA6BYDTQKN6MPANCNFSM4IYLGUDQ.

pucicu avatar Sep 19 '19 20:09 pucicu

Congratulations!

Here's some code that should help avoid materializing the Boolean values:

module SingleValueArrays

struct SingleValueArray{T, N} <: AbstractArray{T, N}
    value::T
    size::NTuple{N, Int64}
end

SingleValueArray(i::Int64, val::T) where T = SingleValueArray{T, 1}(Tuple(i), val)

Base.size(s::SingleValueArray) = s.size

Base.length(s::SingleValueArray) = prod(s.size)

Base.getindex(s::SingleValueArray, args...) = s.value # we really don't care

Base.setindex!(s::SingleValueArray, args...) = error("SingleValueArrays are immutable!")

export SingleValueArray

end

Then, we could just pass this into the SparseMatrixCSC constructor.

I'm not sure if we can avoid materializing the array when constructing the sparse matrix, though. Will need to be tested.

Edit: one could envision a segmented vector, which has n values and breaks between them; that might avoid materializing the full column array, as well (at some runtime penalty for index lookup).

asinghvi17 avatar Sep 25 '19 16:09 asinghvi17

Base.getindex(s::SingleValueArray, args...) = s.value # we really don't care

well, we do care a little bit :P one has to check for out of bounds here. This type does not store any information about which entries have the value. I think the best way forward is indeed the original suggestion of making a specialized type of sparse array that has the I and the J of the sparse matrix, but not the values (the nnz).

Datseris avatar Sep 25 '19 20:09 Datseris

My idea was that we could put this array type as the nzvals while constructing the sparse matrix, so that we can avoid materializing the trues. In that case, though we could do bounds checking, it may not really be necessary.

asinghvi17 avatar Sep 25 '19 20:09 asinghvi17

ah, okay, now I understand. Sorry. What does size achieve then?...

Datseris avatar Sep 25 '19 20:09 Datseris

There's an internal size check in the SparseArray constructor IIRC, so one needs to be able to provide a size.

asinghvi17 avatar Sep 25 '19 20:09 asinghvi17