ComplexityMeasures.jl icon indicating copy to clipboard operation
ComplexityMeasures.jl copied to clipboard

Performance of `AAPE` is suboptimal

Open Datseris opened this issue 2 years ago • 10 comments

internal function AAPE has really bad performance, allocating 10 vectors for each element of a dataset... Thankfully it is easy to optimize it to do 0 allocations so I leave this here for a newcomer that would like to contribute.

Datseris avatar Dec 22 '22 20:12 Datseris

Hello, I'm interested in taking on this issue and optimizing the AAPE function to improve its performance. Before I claim the issue, I was wondering if you could tell me where the function is located in the directory. This will help me to determine if I have the necessary skills to optimize the function. Thank you in advance for your help.

nyedr avatar Dec 22 '22 22:12 nyedr

Hello, I'm interested in taking on this issue and optimizing the AAPE function to improve its performance.

Hey, @EidanGar! We're happy to hear that!

Before I claim the issue, I was wondering if you could tell me where the function is located in the directory.

On the main branch, the AAPE function is located in src/probabilities_estimators/permutation_ordinal/SymbolicPermutation.jl. The source code is relatively simple and looks like this:

"""
    AAPE(x, A::Real = 0.5, m::Int = length(x))

Encode relative amplitude information of the elements of `a`.
- `A = 1` emphasizes only average values.
- `A = 0` emphasizes changes in amplitude values.
- `A = 0.5` equally emphasizes average values and changes in the amplitude values.
"""
function AAPE(x, A::Real = 0.5, m::Int = length(x))
    (A/m)*sum(abs.(x)) + (1-A)/(m-1)*sum(abs.(diff(x)))
end

You can assume x is an m-element tuple/vector. The precise values in the input x is not important. The goal would be to reduce allocations (and computation time, if possible). You can measure how much the function allocates, and how long it takes to run, by using the @btime macro from the BenchmarkTools.jl package

This will help me to determine if I have the necessary skills to optimize the function. Thank you in advance for your help.

Don't worry if you don't have a complete handle on the code/topic immediately. The function is small , so working on such an issue is a great opportunity to learn about code optimization in general. There are plenty of resources on code optimization for Julia, and a good place to start is in the official Julia language documentation.

If you need some help to get started, let us know. But for now, I leave no further hints, so you can have the pleasure of figuring it out yourself :D

EDIT: Just so you know, @EidanGar, we might change the folder structure a bit as part of our final code refactoring for V2, which is coming up really soon. However, the AAPE function will still be in the SymbolicPermutation.jl, possibly just moved one folder level down. This won't affect any work you do on the function itself.

kahaaga avatar Dec 22 '22 22:12 kahaaga

I just ran @btime on this with x::Vector{Float64} and I got 6 allocations.

julia> x = rand(4000);

julia> @btime AAPE($x)
  10.800 μs (6 allocations: 93.89 KiB)
0.4158251611947962```

Are datasets significantly different from Vectors that they cause 4 more allocations?

Chronum94 avatar Dec 27 '22 15:12 Chronum94

Are datasets significantly different from Vectors that they cause 4 more allocations?

@Chronum94 In the applications we have here, AAPE would typically be called with elements of Datasets, which are SVectors. That is, you'd do

x = Dataset(rand(100, 3))
AAPE.(x)
AAPE(x[2]) 

but not

x = Dataset(rand(100, 3))
AAPE(x)

In our applications (ordinal pattern encoding), AAPE will typically only operate on SVectors of length probably not much longer than 10-15.

So to answer you question: it is not the difference between Datasets and Vector that causes the allocations here, it's the specific implementation (using broadcasting etc).

kahaaga avatar Dec 27 '22 15:12 kahaaga

I tried the example given and I get an error which I've seen elsewhere, but don't quite know how to interpret:

julia> x = Dataset(rand(100, 3));

julia> AAPE.(x)
ERROR: MethodError: Cannot `convert` an object of type StaticArraysCore.SVector{3, Float64} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:130
  ...
Stacktrace:
 [1] setindex!(A::Vector{Float64}, x::StaticArraysCore.SVector{3, Float64}, i1::Int64)
   @ Base .\array.jl:966
 [2] copyto!
   @ .\abstractarray.jl:904 [inlined]
 [3] _collect
   @ .\array.jl:718 [inlined]
 [4] collect
   @ .\array.jl:712 [inlined]
 [5] broadcastable
   @ .\broadcast.jl:704 [inlined]
 [6] broadcasted(::Function, ::Dataset{3, Float64})
   @ Base.Broadcast .\broadcast.jl:1296
 [7] top-level scope
   @ REPL[145]:1

Indexing into one of the rows, however, gets me:

 @btime AAPE3(x[1])
  67.143 ns (2 allocations: 48 bytes)
0.4063432882381257

Chronum94 avatar Dec 27 '22 15:12 Chronum94

Ah, sorry, my bad. I didn't test the code. Broadcasting like that isn't implemented Dataset yet. You'll have to do

import Entropies.AAPE # make sure you've checked out main branch of Entropies.jl
using StateSpaceSets

x = Dataset(rand(100, 3))
AAPE.(x.data) # or
[AAPE(pt) for pt in x]

kahaaga avatar Dec 27 '22 16:12 kahaaga

do vec(x) instead of x.data, as that's the public way to get the vector.

Datseris avatar Dec 27 '22 16:12 Datseris

julia> x = Dataset(rand(1000, 15));
julia> @btime AAPE.($vec(x));
  7.900 μs (3 allocations: 7.97 KiB)

Still not seeing it, maybe I'm missing something? This is on Julia 1.8.3.

Chronum94 avatar Dec 27 '22 16:12 Chronum94

What's the question? Why you get allocations? the result of AAPE.($vec(x)) is a new vector containing the application of AAPE to each inner vector of x. Bechmark instead:

x = Dataset(rand(1000, 15))
y = x[1]
@btime AAPE($y)

Datseris avatar Dec 27 '22 17:12 Datseris

My bad, I was referring to the original issue of 10 allocations per element of Dataset, which I'm not seeing here. The above code snippet does in fact give 0 allocations. I was just looking at this to try and figure out where the suboptimal performance was (and maybe explore it).

Chronum94 avatar Dec 27 '22 18:12 Chronum94