julia
julia copied to clipboard
Canonicalize names of nested functions by keeping a more fine grained counter -- per (module, method name) pair
As mentioned in https://github.com/JuliaLang/julia/pull/53716, we've been noticing that precompile statements lists from one version of our codebase often don't apply cleanly in a slightly different version.
That's because a lot of nested and anonymous function names have a global numeric suffix which is incremented every time a new name is generated, and these numeric suffixes are not very stable across codebase changes.
To solve this, this PR makes the numeric suffixes a bit more fine grained: every pair of (module, top-level/outermost function name) will have its own counter, which should make nested function names a bit more stable across different versions.
This PR applies @JeffBezanson's idea of making the symbol name changes directly in current-julia-module-counter.
Here is an example:
julia> function foo(x)
function bar(y)
return x + y
end
end
foo (generic function with 1 method)
julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)
Serialization doesn't seem happy with this PR...
I think you need to keep this mostly inside flisp (particularly parsed_method_stack). I also don't think the module name has much value in being mangled into it. Making the name be a combination of the containing method + index into that method name seems somewhat prettier, I also don't see how that canonicalizes names any more than before. I don't really like that this adds more hidden and secret global statefulness to the lowering algorithm as well.
@vtjnash: to reiterate, our goal here is to improve the consistency of the names generated for these functions. Currently, adding one lambda inside one function can change the generated names of all nested functions across all modules in our application, which means that a number of the precompile statements emitted via --trace-compile by the previous version will fail on the new version.
This PR essentially makes the generated names/numbers independent of changes to other modules+functions.
Do you see any other way for us to accomplish this or does this explanation address your objection(s)?
across all modules
Wrong since the counter is already per-module (grep for fl_current_module_counter).
Still, as mentioned above, the proposed change would ensure that adding a nested function doesn't perturb the name generation in literally every numeric suffixed name in a given module.
We've tested this change by exercising it through compilation warmup in our application. Without this, we saw roughly 53% warmup effectiveness when upgrading versions. With this, we're seeing 68% warmup effectiveness. Although this comparison is not quite apples-to-apples, there is a notable reduction in precompile evaluation failures across versions and warmed up performance has improved. So this is a good change for us.
Ran more tests to verify the gains from this and have verified that this is a positive change for us.
Any objections to merging @JeffBezanson?
Could we use the bindings table to avoid needing this huge extra space?
Not sure if I get it?
The current implementation stores one extra hash table per module, and each hash table will have an entry for every top-level method name.
This doesn't seem that concerning from a memory consumption point of view, but I might be wrong.
Addressed most of the concerns raised above in the last commits.
In particular:
- Canonicalized names are no longer using
<>or module names. They are using##to keep consistency with other names for callable types.
Example:
julia> function foo(x)
function bar(y)
return x + y
end
end
foo (generic function with 1 method)
julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)
-
Counter table is now local to every module.
-
The parsed method stack is implemented in Scheme.
Unfortunately I discovered this morning a segfault on this PR:
$ cat src/TestProject.jl
module TestProject
function foo(x::Float64)
return (y)->x + y
end
end # module TestProject
julia> using TestProject
julia> Core.eval(TestProject, :(function foo(x::Int)
return (y)->x+y
end))
[334559] signal 11 (1): Segmentation fault
in expression starting at none:1
ptrhash_get at /home/topolarity/repos/julia/src/support/ptrhash.c:26
unknown function (ip: 0x7fff5166d51f)
Allocations: 5045387 (Pool: 5044906; Big: 481); GC: 6
[1] 334559 segmentation fault ../../julia/julia --project=. -q
This seems related to an important problem that @vtjnash pointed out to me yesterday. The counter_table field will be lost across any serialization boundaries for a Module, which can cause us to accidentally emit the same typename for multiple different anonymous functions (leading to corruption).
The solution would seem to be either to:
- Add specific serialization support for
counter_tableto preserve it across the serialization boundary, OR - Check in the binding table directly for type name collisions
That segfault and the serialization problem need to be resolved, but I'm ready to approve once they are
The solution would seem to be either to:
- Add specific serialization support for counter_table to preserve it across the serialization boundary, OR
- Check in the binding table directly for type name collisions
Did a mix of both and lazily reconstructed the counter_table of a freshly serialized module through successive lookups in the binding table.
Performance doesn't seem terrible, but a few parts could probably be optimized.
I'm not sure whether or not we actually delete bindings right now, but that seems like an unfortunate fragility.
Yes, that's fair. Though if we plan on using the bindings table to save space from the counter_table (as @vtjnash suggested here), then it seems we might suffer from the same issue?
Bump?
@JeffBezanson sent his OK on the approach described here.
@vtjnash, @topolarity: any further objections on merging this?
:+1: Looks good now, just needs finishing touches.
Are you able to make those small cleanups soon? It would be nice to get it in before https://github.com/JuliaLang/julia/pull/54788 causes conflicts
Are you able to make those small cleanups soon? It would be nice to get it in before https://github.com/JuliaLang/julia/pull/54788 causes conflicts
Yes. Plan to take a look at this over the weekend.
I don't think https://github.com/JuliaLang/julia/pull/54788 by itself will cause issues since existence in the binding table is still semantic on that branch.
@vtjnash: I believe I addressed the remaining comments.
Any additional requests?
Good to merge?
Plan to take a look at this over the weekend.
Unfortunately was not able to meet this deadline. Apologies for the delay here.
Bump?
Bump?
Thanks! This took some perseverance @d-netto , but it's time to land