Gridap.jl Extend table to support arbitrary vector types

Extend table to support arbitrary vector types

Open fverdugo opened this issue 4 years ago • 3 comments

Change from this

struct Table{T,P} <: AbstractVector{Vector{T}}
  data::Vector{T}
  ptrs::Vector{P}
end

to this

struct Table{T,Vd<:AbstractVector{T},Vp<:AbstractVector} <: AbstractVector{Vector{T}}
  data::Vd
  ptrs::Vp
end

Jun 26 '20 07:06 fverdugo

Allow setting values in a table using the standard julia interface of vector of vectors

I.e.,

a = Table(....)
a[i][j]

Jun 26 '20 07:06 fverdugo

@santiagobadia @amartinhuertas

I have developed these simple benchmarks to check the performance hit of using views when indexing a table

module KK

struct Table{Vd,Vp}
  data::Vd
  ptrs::Vp
end

@inline function Base.getindex(a::Table,i::Integer)
  @inbounds pini = a.ptrs[i]
  @inbounds pend = a.ptrs[i+1]-1
  ids = pini:pend
  view(a.data,ids)
end

function Base.length(a::Table)
  length(a.ptrs)-1
end

function length_to_ptrs!(ptrs)
  ptrs[1] = 1
  @inbounds for i in 1:(length(ptrs)-1)
    ptrs[i+1] += ptrs[i]
  end
end

function random_table(ng,nl)
  ptrs = fill(nl,ng+1)
  length_to_ptrs!(ptrs)
  data = rand(nl*ng)
  Table(data,ptrs)
end

function sum_like_array(a::Table)
  s = 0.0
  @inbounds for i in 1:length(a)
    for j in 1:length(a[i])
      s += a[i][j]
    end
  end
  s
end

function sum_like_table(a::Table)
  s = 0.0
  @inbounds for i in 1:length(a)
    pini = a.ptrs[i]
    pend = a.ptrs[i+1]-1
    for p in pini:pend
      s += a.data[p]
    end
  end
  s
end

function fill_like_array!(a::Table)
  s = 0.0
  @inbounds for i in 1:length(a)
    for j in 1:length(a[i])
      a[i][j] = s
    end
  end
  a
end

function fill_like_table!(a::Table)
  s = 0.0
  @inbounds for i in 1:length(a)
    pini = a.ptrs[i]
    pend = a.ptrs[i+1]-1
    for p in pini:pend
      a.data[p] = s
    end
  end
  a
end

using BenchmarkTools

a = random_table(Int(1e7),10)

@btime sum_like_array($a)
@btime sum_like_table($a)
@btime fill_like_array!($a)
@btime fill_like_table!($a)

end # module

The timing results are

  129.072 ms (0 allocations: 0 bytes)
  123.982 ms (0 allocations: 0 bytes)
  115.403 ms (0 allocations: 0 bytes)
  65.649 ms (0 allocations: 0 bytes)

It seems that accessing the fields of the table explicitly is more performant for the case in which we fill the table (65.649 ms vs 115.403 ms). For the case we only consume the values the performance is similar by explicitly accessing the fields or using views.

However, if we deactivate in-lining with

@noinline function Base.getindex(a::Table,i::Integer)
  @inbounds pini = a.ptrs[i]
  @inbounds pend = a.ptrs[i+1]-1
  ids = pini:pend
  view(a.data,ids)
end

Then, using views is very inefficient:

  1.066 s (110000000 allocations: 4.92 GiB)
  122.575 ms (0 allocations: 0 bytes)
  1.048 s (110000000 allocations: 4.92 GiB)
  66.400 ms (0 allocations: 0 bytes)

Jun 26 '20 14:06 fverdugo

Hi @fverdugo,

thanks for sharing this micro-benchmark. Same (qualitative) outcome in my machine. In particular:

With @inline and without it:

  126.372 ms (0 allocations: 0 bytes)
  120.451 ms (0 allocations: 0 bytes)
  228.045 ms (0 allocations: 0 bytes)
  64.823 ms (0 allocations: 0 bytes)

With @noinline:

  1.386 s (110000000 allocations: 4.92 GiB)
  119.027 ms (0 allocations: 0 bytes)
  1.349 s (110000000 allocations: 4.92 GiB)
  64.400 ms (0 allocations: 0 bytes)

As a side note, I also tried in my machine the following optimization (at the expense of extra memory consumption), that some how better ressembles they way I am now handling the implementation of MPIPETScDistributedVector:

struct Table{Vd,Vp,Vv}
  data::Vd
  ptrs::Vp
  views::Vv
  function Table(data::Vd,ptrs::Vp) where {Vd,Vp}
     TSUB=SubArray{eltype(Vd),1,Vd,Tuple{UnitRange{Int64}},true}
     views=TSUB[ view(data,ptrs[i]:ptrs[i+1]-1) for i=1:length(ptrs)-1 ]
     new{Vd,Vp,typeof(views)}(data,ptrs,views)
  end
end

@noinline function Base.getindex(a::Table,i::Integer)
  @inbounds v=a.views[i]
  v
end

With @inline and without it:

  144.999 ms (0 allocations: 0 bytes)
  122.821 ms (0 allocations: 0 bytes)
  106.697 ms (0 allocations: 0 bytes)
  66.059 ms (0 allocations: 0 bytes)

With @noinline:

  296.765 ms (0 allocations: 0 bytes)
  117.954 ms (0 allocations: 0 bytes)
  241.276 ms (0 allocations: 0 bytes)
  65.924 ms (0 allocations: 0 bytes)

First, there seems to be a performance hit whenever we sum all entries of the table (144.999 ms versus 126.372 ms), but a performance improvement whenever we fill the table (106.697 ms versus 228.045 ms). Second, although @noinline impacts perfomance, it does not have such a significant impact (indeed now, as expected, there are no allocations).

In any case, what's clear from this experiment is that accessing the table entries using the raw p and data fields directly is the most efficient alternative (as we expected during the meeting). On the other hand, to be honest, looking just at the raw timings, it is hard (at least up to my current knowledge) to determine the actual causes behind the behaviour we are observing. There are very complex issues involved under the hood, with low level compiler optimizations and memory locality and the cache system at the foremost.

Jun 29 '20 04:06 amartinhuertas

Gridap.jl Gridap.jl copied to clipboard

Extend table to support arbitrary vector types

Gridap.jl
Gridap.jl copied to clipboard