Mill.jl icon indicating copy to clipboard operation
Mill.jl copied to clipboard

Enhancement: Various speedups

Open simonmandlik opened this issue 3 years ago • 3 comments

Noting down some areas where significant speedups may be achieved:

  • vcat in ProductNodes leads to a lot of copying
  • data deduplication in leaves may lead to lower memory requirements and also to saving some compute (computing ngrams only once for identical strings in NGramMatrix multiplication)
  • deduplicating instances in BagNodes in a similar fashion

simonmandlik avatar Mar 15 '21 16:03 simonmandlik

PoC for deduplication in NGramMatrices

julia> function Mill._mul(A::AbstractMatrix, S::PooledVector, n, b, m)
           C = zeros(eltype(A), size(A, 1), length(S))
           iz = Mill._init_z(n, b)
           idcs = Dict(r => Queue{Int}() for r in values(S.invpool))
           for (i,r) in S.refs |> enumerate
               enqueue!(idcs[r], i)
           end
           for (s, r) in S.invpool
               z = iz
               for l in 1:Mill._len(s, n)
                   z = Mill._next_ngram(z, l, codeunits(s), n, b)
                   zm = z%m + 1
                   for k in idcs[r]
                       for i in 1:size(C, 1)
                           @inbounds C[i, k] += A[i, zm]
                       end
                   end
               end
           end
           C
       end

julia> ss = [randstring(50), randstring(50)];
julia> S = [rand(ss) for _ in 1:100];
julia> n1 = NGramMatrix(S);
julia> n2 = NGramMatrix(PooledArray(S));
julia> x = randn(100, 2053);
julia> x*n1; @btime x*n1;
  181.145 μs (2 allocations: 78.20 KiB)

julia> x*n2; @btime x*n2;
  108.265 μs (14 allocations: 95.23 KiB)

simonmandlik avatar May 31 '21 07:05 simonmandlik

I guess that dedup in NGramMatrices will offer the highest benefit. Multiplication of OneHot is essentially a copying, and Dense matrices should not contain that many duplicates (although they might).

pevnak avatar Jul 15 '21 06:07 pevnak

Sure, and deduplication of instances in BagNodes as well. That said, it is possible that in some cases the vanilla version will still be faster

simonmandlik avatar Jul 16 '21 08:07 simonmandlik