IndirectArrays.jl icon indicating copy to clipboard operation
IndirectArrays.jl copied to clipboard

Noninteger indexes

Open jtrakk opened this issue 3 years ago • 1 comments

I would like to look up non-integer values in my IndirectArray. LabelledArrays.jl supports noninteger indexes, but IndirectArrays currently enforces ::Int for the index type. Could IndirectArrays allow this type restriction to be loosened?

using LabelledArrays
using IndirectArrays

indexes = Symbol.(["a", "b", "b", "c", "c", "c"])
values = [10, 20, 30]
label_dict = Dict(v=>k for (k,v) in enumerate(unique(indexes)))
label_array = LVector(;label_dict...)
@assert label_array[:c] == 3
lookup = IndirectArray(label_array, values)
lookup[:c] == 30

ArgumentError: invalid index: :c of type Symbol
Stacktrace:
 [1] to_index(::Symbol) at ./indices.jl:297
 [2] to_index(::IndirectArray{Int64,1,LArray{Int64,1,Array{Int64,1},(:a, :b, :c)},Array{Int64,1}}, ::Symbol) at ./indices.jl:274
 [3] to_indices at ./indices.jl:325 [inlined]
 [4] to_indices at ./indices.jl:322 [inlined]
 [5] getindex(::IndirectArray{Int64,1,LArray{Int64,1,Array{Int64,1},(:a, :b, :c)},Array{Int64,1}}, ::Symbol) at ./abstractarray.jl:1060

jtrakk avatar Nov 19 '20 01:11 jtrakk

This shouldn't be expected to be fast. All of the specializations that are being done are specifically based on the number of elements that will not scale well.

From the README and the docs I suspected using MArrays is fine for big arrays unlike SArrays.

What text in the README led to this? That part should be clarified.

ChrisRackauckas avatar Nov 16 '18 10:11 ChrisRackauckas

Well essentially it is that the README is all about SArrays and does not mention the other types. So maybe a sentence like "Any type of the StaticArrays package is performant for small arrays ( ~ 100 elements ) only" would do, if it is correct.

BeastyBlacksmith avatar Nov 16 '18 10:11 BeastyBlacksmith

Yes, sorry if that wasn't clear, the StaticArrays methods aren't really well suited for large arrays. Julia isn't really built for efficiently handling (i.e. compiling code for) tuples of 5000 elements (and internally StaticArrays works with tuples).

andyferris avatar Nov 16 '18 11:11 andyferris

By revisiting the docs, for my case a SizedArray will be most suitable (construction is still 3x slower compared to plain Array, but thats fine), so that sentence from above is a little too harsh.

BeastyBlacksmith avatar Nov 16 '18 11:11 BeastyBlacksmith

@BeastyBlacksmith, "performance" isn't quite the right word here: most/all of what you're measuring is probably compile time, not run time. Suppose you have a function foo that takes an AbstractVector. If you call foo on a regular Vector, Julia will compile just one version of foo and it will work for vectors of length n and n+1. With SArray/MArray, the compiler will generate separate versions of foo specialized for n and n+1. That makes a lot of sense for small vectors (you can get rid of loops, for example, and increase the performance of an algorithm that has to run in 1, 2, or 3 dimensions), but there is typically little point for large ones.

Compilation is slow, especially for large tuples. You might still find that, once compiled, the methods run quite quickly. Which is why "performance" is not a very good way of describing the problem here.

timholy avatar Nov 16 '18 11:11 timholy

You might still find that, once compiled, the methods run quite quickly.

This is not the problem here:

julia> using StaticArrays
julia> N=500;r=rand(N); 
julia> @time MVector{N}(r);
  1.555657 seconds (1.07 M allocations: 54.239 MiB, 14.38% gc time)
julia> @time MVector{N}(r);
  0.000044 seconds (17 allocations: 8.922 KiB)

We see that the amount of allocated space is too large compared to the expected 4 kb, so something is wrong. After compilation, this should be a simple memcopy + boundschecks.

julia> N=500;r=rand(N);
julia> @btime MVector{$N}($r);
  20.392 μs (13 allocations: 8.77 KiB)
julia> N=1000;r=rand(N);
julia> @btime MVector{$N}($r);
  36.625 μs (14 allocations: 16.59 KiB)
julia> @btime copy($r);
  974.400 ns (1 allocation: 7.94 KiB)

chethega avatar Nov 16 '18 17:11 chethega

@chethega interpolating the $N doesn't appear to be working quite as I expected in btime here. The following seems confusing (julia-1.2-rc1):

julia> N=1000;r=rand(N);

julia> @btime MVector{$N}($r);
  3.978 μs (14 allocations: 16.59 KiB)

julia> @btime MVector{1000}($r);
  692.565 ns (1 allocation: 7.88 KiB)

c42f avatar Jul 31 '19 08:07 c42f

Finally, I found this issue and noticed the speed description in doc. In my routine work, when I wanna deal performance issues, it means many 'large' arrays T_T

kaji331 avatar Jul 18 '23 08:07 kaji331

How large, if I may ask?

andyferris avatar Jul 18 '23 08:07 andyferris

About 30000 rows, 3000 to 1 million columns

获取Outlook for Androidhttps://aka.ms/AAb9ysg


From: Andy Ferris @.> Sent: Tuesday, July 18, 2023 4:17:37 PM To: JuliaArrays/StaticArrays.jl @.> Cc: kaji331 @.>; Comment @.> Subject: Re: [JuliaArrays/StaticArrays.jl] Construction of big MArrays from arrays is slow (#542)

How large, if I may ask?

― Reply to this email directly, view it on GitHubhttps://github.com/JuliaArrays/StaticArrays.jl/issues/542#issuecomment-1639739613, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABOC45UXK2A5OEIQTJCDPFLXQZBCDANCNFSM4GEYCG3A. You are receiving this because you commented.Message ID: @.***>

kaji331 avatar Jul 23 '23 13:07 kaji331