julia
julia copied to clipboard
RFC: syntax: add noinline/inline support overrides for structs
I have wanted to this sometimes, to be able to control whether a particularly large struct is stored by reference or value. This was somewhat of a quick job, so it feels like a bit of a hacky implementation, but maybe that is okay. Long term, I would probably also like @noinline
to work if applied to a field (but that is hard) or when syntactically written in the struct instead of outside of it. But this seemed to at least get the feature working.
An example preview:
julia> @noinline struct NoinlineSingleton end
julia> @inline struct NormalSingleton end
julia> @inline struct Impossible; x::Impossible; end
ERROR: Cannot apply @inline to this struct which is self-referential or mutable
Stacktrace:
[1] top-level scope
@ REPL[5]:1
julia> sizeof(Tuple{NoinlineSingleton}) # observe this is allocated as a pointer to a singleton
8
julia> sizeof(Tuple{NormalSingleton}) # observe this is allocated inline as zero bytes
0
This looks nice to have! However, I think it'd typically be more useful to have something like this in the field position rather than a universal decorator for a given type -- that is I want to be able to write
struct Foo
@inline x::Bar
end
to have Bar
be inline in Foo
. The reason I want that is that writing e.g.
struct Foo{T, U, V, W}
x::Tuple{T, U, V, W}
end
is not always the same as having
struct Foo{T, U, V, W}
a::T
b::U
c::V
d::W
end
and I have no control over the definition of Tuple
, so I can't mark it as @inline
(even if it were a regular julia struct)
Can we use something different than @inline
? Having the same macro mean completely different thing in different contexts feels confusing to me.
What else would you call it though? In lisp languages (like Julia), code is simply data, and both are decisions of whether to copy data inline into a parent object or not. Should we be more precise like @allocateinline
(to match the pre-existing function name for reflection of this property, which is Base.allocatedinline()
)? It currently does potentially make the patch much harder though, since we need to adapt lowering, and not hack it on later like this though. I realize that reasoning probably shouldn't drive our API choices though.
@MasonProtter you are welcome to implement that, but I specifically mentioned that is harder and a long-term idea, but not one for this PR
Yes, I think something like @allocateinline
would be better.
I'm interested in a feature like this to be able to do atomic operations on inner structs like:
struct Foo
@atomic data::Bar
meta::Baz
end
If Bar
is a non-mutable struct, then it often ends up inline allocated in Foo
which makes the atomic operations very slow.
I think in an "ideal world", the atomic field would be non-inline-allocated by default (preferring word size atomics), and then I could annotate the field with @inlinefield
(or @allocateinline
or w/e) to force it to be inlined if I really want that behavior.
However, I don't really want to decide that behavior globally like this PR does now. I think it doesn't make any more sense to declare Bar
as globally inline-allocated than it does to declare Bar
as globally atomic - the decision ought to be on the user not the definer of the datatype (although setting a default seems fine)
which makes the atomic operations very slow.
Citation needed? It only has one extra atomic operation (on the same cache line) compared to the pointer case but elides an allocation, so it would be expected to be faster to inline in nearly all cases where it is possible
Citation needed? It only has one extra atomic operation (on the same cache line) compared to the pointer case but elides an allocation, so it would be expected to be faster to inline in nearly all cases where it is possible
Sure, this example is about ~113x slower when inline-allocated:
julia> struct Inner
a::UInt
b::UInt
c::UInt
d::UInt
e::UInt
end
julia> mutable struct Foo
@atomic inner::Inner
const y::UInt
end
julia> function swapalot(o, a, b)
for i = 1:1_000_000
inner = @atomic :acquire o.inner
if inner.a == 1
@atomic :release o.inner = a
elseif inner.a == 0
@atomic :release o.inner = b
end
end
end
julia> a = Inner(0,0,0,0,0); b = Inner(1,0,0,0,0); o = Foo(a, 0)
julia> using BenchmarkTools
julia> @btime swapalot(o, a, b)
22.156 ms (0 allocations: 0 bytes)
Changing the code to mutable struct Inner
speeds it up dramatically:
julia> @btime swapalot(o, a, b)
196.052 μs (0 allocations: 0 bytes)
Sure, if you are comparing 2 unrelated operations, that may be the case. But it you compare similar things, then the difference is much reduced:
julia> mutable struct Foo
@atomic inner::Base.RefValue{Inner}
const y::UInt
end
julia> function swapalot(o, a, b)
for i = 1:1_000_000
inner = (@atomic :acquire o.inner)[]
if inner.a == 1
@atomic :release o.inner = Ref(a)
elseif inner.a == 0
@atomic :release o.inner = Ref(b)
end
end
end
julia> a = Inner(0,0,0,0,0); b = Inner(1,0,0,0,0); o = Foo(Ref(a), 0);
julia> @btime swapalot(o, a, b)
11.001 ms (1000000 allocations: 45.78 MiB)
# vs 33.200 ms (0 allocations: 0 bytes) on this machine
As far as I know, the remaining difference could be solved by using a better design for the lock itself, since right now it is a very generic jl_mutex_unlock_nogc and is thus may be somewhat slow for this particular exact use case