julia icon indicating copy to clipboard operation
julia copied to clipboard

RFC: syntax: add noinline/inline support overrides for structs

Open vtjnash opened this issue 1 year ago • 4 comments

I have wanted to this sometimes, to be able to control whether a particularly large struct is stored by reference or value. This was somewhat of a quick job, so it feels like a bit of a hacky implementation, but maybe that is okay. Long term, I would probably also like @noinline to work if applied to a field (but that is hard) or when syntactically written in the struct instead of outside of it. But this seemed to at least get the feature working.

An example preview:

julia> @noinline struct NoinlineSingleton end

julia> @inline struct NormalSingleton end

julia> @inline struct Impossible; x::Impossible; end
ERROR: Cannot apply @inline to this struct which is self-referential or mutable
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

julia> sizeof(Tuple{NoinlineSingleton}) # observe this is allocated as a pointer to a singleton
8

julia> sizeof(Tuple{NormalSingleton}) # observe this is allocated inline as zero bytes
0

vtjnash avatar Sep 28 '23 16:09 vtjnash

This looks nice to have! However, I think it'd typically be more useful to have something like this in the field position rather than a universal decorator for a given type -- that is I want to be able to write

struct Foo
    @inline x::Bar
end

to have Bar be inline in Foo. The reason I want that is that writing e.g.

struct Foo{T, U, V, W}
     x::Tuple{T, U, V, W}
end

is not always the same as having

struct Foo{T, U, V, W}
    a::T
    b::U
    c::V
    d::W
end

and I have no control over the definition of Tuple, so I can't mark it as @inline (even if it were a regular julia struct)

MasonProtter avatar Sep 29 '23 15:09 MasonProtter

Can we use something different than @inline? Having the same macro mean completely different thing in different contexts feels confusing to me.

KristofferC avatar Sep 29 '23 16:09 KristofferC

What else would you call it though? In lisp languages (like Julia), code is simply data, and both are decisions of whether to copy data inline into a parent object or not. Should we be more precise like @allocateinline (to match the pre-existing function name for reflection of this property, which is Base.allocatedinline())? It currently does potentially make the patch much harder though, since we need to adapt lowering, and not hack it on later like this though. I realize that reasoning probably shouldn't drive our API choices though.

@MasonProtter you are welcome to implement that, but I specifically mentioned that is harder and a long-term idea, but not one for this PR

vtjnash avatar Oct 02 '23 16:10 vtjnash

Yes, I think something like @allocateinline would be better.

KristofferC avatar Feb 14 '24 10:02 KristofferC

I'm interested in a feature like this to be able to do atomic operations on inner structs like:

struct Foo
    @atomic data::Bar
    meta::Baz
end

If Bar is a non-mutable struct, then it often ends up inline allocated in Foo which makes the atomic operations very slow.

I think in an "ideal world", the atomic field would be non-inline-allocated by default (preferring word size atomics), and then I could annotate the field with @inlinefield (or @allocateinline or w/e) to force it to be inlined if I really want that behavior.

However, I don't really want to decide that behavior globally like this PR does now. I think it doesn't make any more sense to declare Bar as globally inline-allocated than it does to declare Bar as globally atomic - the decision ought to be on the user not the definer of the datatype (although setting a default seems fine)

topolarity avatar Jul 04 '24 21:07 topolarity

which makes the atomic operations very slow.

Citation needed? It only has one extra atomic operation (on the same cache line) compared to the pointer case but elides an allocation, so it would be expected to be faster to inline in nearly all cases where it is possible

vtjnash avatar Jul 05 '24 05:07 vtjnash

Citation needed? It only has one extra atomic operation (on the same cache line) compared to the pointer case but elides an allocation, so it would be expected to be faster to inline in nearly all cases where it is possible

Sure, this example is about ~113x slower when inline-allocated:

julia> struct Inner
           a::UInt
           b::UInt
           c::UInt
           d::UInt
           e::UInt
       end

julia> mutable struct Foo
           @atomic inner::Inner
           const y::UInt
       end

julia> function swapalot(o, a, b)
           for i = 1:1_000_000
               inner = @atomic :acquire o.inner
               if inner.a == 1
                   @atomic :release o.inner = a
               elseif inner.a == 0
                   @atomic :release o.inner = b
               end
           end
       end

julia> a = Inner(0,0,0,0,0); b = Inner(1,0,0,0,0); o = Foo(a, 0)

julia> using BenchmarkTools

julia> @btime swapalot(o, a, b)
  22.156 ms (0 allocations: 0 bytes)

Changing the code to mutable struct Inner speeds it up dramatically:

julia> @btime swapalot(o, a, b)
  196.052 μs (0 allocations: 0 bytes)

topolarity avatar Jul 05 '24 11:07 topolarity

Sure, if you are comparing 2 unrelated operations, that may be the case. But it you compare similar things, then the difference is much reduced:

julia> mutable struct Foo
           @atomic inner::Base.RefValue{Inner}
           const y::UInt
       end

julia> function swapalot(o, a, b)
           for i = 1:1_000_000
               inner = (@atomic :acquire o.inner)[]
               if inner.a == 1
                   @atomic :release o.inner = Ref(a)
               elseif inner.a == 0
                   @atomic :release o.inner = Ref(b)
               end
           end
       end

julia> a = Inner(0,0,0,0,0); b = Inner(1,0,0,0,0); o = Foo(Ref(a), 0);

julia> @btime swapalot(o, a, b)
  11.001 ms (1000000 allocations: 45.78 MiB)
# vs 33.200 ms (0 allocations: 0 bytes) on this machine

As far as I know, the remaining difference could be solved by using a better design for the lock itself, since right now it is a very generic jl_mutex_unlock_nogc and is thus may be somewhat slow for this particular exact use case

vtjnash avatar Jul 05 '24 20:07 vtjnash