julia
julia copied to clipboard
Base.summarysize returns different memory usage values on different runs
MWE:
mutable struct B
x::Union{Float64, Tuple{Float64, Float64}}
y::Union{Float64, Tuple{Float64, Float64}}
end
ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];
now if you execute:
julia> Base.summarysize(ex)
7776776
julia> Base.summarysize(ex)
7820648
julia> Base.summarysize(ex)
7923000
julia> Base.summarysize(ex)
7911824
This can become very extreme in some situation, I saw a 2x difference between runs on a more complex vector of structs.
Tested in 1.10 and nightly, same behaviour
The trouble comes from doing the wrong query here: https://github.com/JuliaLang/julia/blob/c42df604d99c521ad721e64d8be6286cd597e8f9/base/summarysize.jl#L60
Versus a more correct implementation of the function, as seen here that uses isptr
to avoid double-counting:
https://github.com/JuliaLang/julia/blob/c42df604d99c521ad721e64d8be6286cd597e8f9/base/compiler/utilities.jl#L95
Hey @vtjnash, I would like to work on this issue it would be great if you can guide me. Thanks.
It looks like this has been a problem since 1.0, so it may need to be reverse ported.
PS T:\> julia +1.10 .\53061-summarysize.jl
[7915808, 7931712, 7954968, 7967504, 7987304, 8000040, 7990696, 7982432, 7967640, 7956784]
PS T:\> julia +1.9 .\53061-summarysize.jl
[7721968, 7877112, 7937216, 7974416, 7981720, 8000040, 7999040, 7984648, 7984824, 7971424]
PS T:\> julia +1.8 .\53061-summarysize.jl
[7626120, 7933096, 7917768, 7961184, 7926904, 7968272, 7980464, 7999608, 8000040, 7988592]
PS T:\> julia +1.6 .\53061-summarysize.jl
[7330136, 7925376, 7940840, 7980024, 7981240, 7999712, 8000040, 7934824, 7980592, 7964688]
PS T:\> julia +1.0 .\53061-summarysize.jl
[7312352, 7557528, 7501648, 7327928, 7351744, 7391160, 7510224, 7323608, 7288408, 7399816]
PS T:\> cat .\53061-summarysize.jl
mutable struct B
x::Union{Float64, Tuple{Float64, Float64}}
y::Union{Float64, Tuple{Float64, Float64}}
end
ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];
[ Base.summarysize(ex) for _ in 1:10 ] |> println
I don't think that could be the same, since the Union-storage optimization (that is currently resulting in unreliable over-estimation, wasn't implemented until later)
@re1san If you take a look at the difference between those two implementations, do you see what parts of the base/compiler/utilities.jl there could be ported to the base/summarysize.jl so that the estimate by summarysize is the same as the accurate value computed by the compiler code?
By the way this can even invert the memory efficiency of structs and mutable structs:
struct Z end
struct A
b::Union{Z, Tuple{Float64, Float64}}
d::Union{Z, Int}
c::Union{Z, Symbol}
end
mutable struct B
b::Union{Z, Tuple{Float64, Float64}}
d::Union{Z, Int}
c::Union{Z, Symbol}
end
vec_notmut = A[A(Z(), 1, :s) for _ in 1:10^6];
vec_mut = B[B(Z(), 1, :s) for _ in 1:10^6];
which results in
julia> Base.summarysize(vec_notmut)
83717104
julia> Base.summarysize(vec_mut)
56000064
(if it is the same problem, I'm a bit of unsure because here summarysize for vec_mut
is stable while for vec_notmut
is not)
Somewhat unclear, but it is at least true that Array/Memory does use a different branch condition when deciding whether to double-count the memory usage of the elements: https://github.com/JuliaLang/julia/blob/c42df604d99c521ad721e64d8be6286cd597e8f9/base/summarysize.jl#L143
So, would
nf = nfields(x)
- ft = typeof(x).types
- if !isbitstype(ft[i]) && isdefined(x, i)
- val = getfield(x, i)
+ dt = typeof(x)
+ dtfd = Base.DataTypeFieldDesc(dt)
+ if isdefined(x, i)
+ f = getfield(x, i)
+ if dtfd[i].isptr || !Base.datatype_pointerfree(typeof(f))
+ val = f
+ end
end
end
take care of https://github.com/JuliaLang/julia/issues/53061#issue-2101424064?
julia> Base.summarysize(ex)
5600056
julia> Base.summarysize(ex)
5600056
julia> Base.summarysize(ex)
5600056
The second case (vec_notmut
) still oscillates.
I checked it out your implementation @xlxs4, I think I understood the logic and it seems fine to me
Could you open up a PR with that @xlxs4?