julia icon indicating copy to clipboard operation
julia copied to clipboard

Canonicalize names of nested functions by keeping a more fine grained counter -- per (module, method name) pair

Open d-netto opened this issue 1 year ago • 16 comments

As mentioned in https://github.com/JuliaLang/julia/pull/53716, we've been noticing that precompile statements lists from one version of our codebase often don't apply cleanly in a slightly different version.

That's because a lot of nested and anonymous function names have a global numeric suffix which is incremented every time a new name is generated, and these numeric suffixes are not very stable across codebase changes.

To solve this, this PR makes the numeric suffixes a bit more fine grained: every pair of (module, top-level/outermost function name) will have its own counter, which should make nested function names a bit more stable across different versions.

This PR applies @JeffBezanson's idea of making the symbol name changes directly in current-julia-module-counter.

Here is an example:

julia> function foo(x)
           function bar(y)
               return x + y
           end
       end
foo (generic function with 1 method)

julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)

d-netto avatar Mar 13 '24 04:03 d-netto

Serialization doesn't seem happy with this PR...

d-netto avatar Mar 13 '24 12:03 d-netto

I think you need to keep this mostly inside flisp (particularly parsed_method_stack). I also don't think the module name has much value in being mangled into it. Making the name be a combination of the containing method + index into that method name seems somewhat prettier, I also don't see how that canonicalizes names any more than before. I don't really like that this adds more hidden and secret global statefulness to the lowering algorithm as well.

@vtjnash: to reiterate, our goal here is to improve the consistency of the names generated for these functions. Currently, adding one lambda inside one function can change the generated names of all nested functions across all modules in our application, which means that a number of the precompile statements emitted via --trace-compile by the previous version will fail on the new version.

This PR essentially makes the generated names/numbers independent of changes to other modules+functions.

Do you see any other way for us to accomplish this or does this explanation address your objection(s)?

kpamnany avatar Mar 20 '24 16:03 kpamnany

across all modules

Wrong since the counter is already per-module (grep for fl_current_module_counter).

Still, as mentioned above, the proposed change would ensure that adding a nested function doesn't perturb the name generation in literally every numeric suffixed name in a given module.

d-netto avatar Mar 20 '24 22:03 d-netto

We've tested this change by exercising it through compilation warmup in our application. Without this, we saw roughly 53% warmup effectiveness when upgrading versions. With this, we're seeing 68% warmup effectiveness. Although this comparison is not quite apples-to-apples, there is a notable reduction in precompile evaluation failures across versions and warmed up performance has improved. So this is a good change for us.

kpamnany avatar Mar 22 '24 17:03 kpamnany

Ran more tests to verify the gains from this and have verified that this is a positive change for us.

Any objections to merging @JeffBezanson?

kpamnany avatar Mar 26 '24 19:03 kpamnany

Could we use the bindings table to avoid needing this huge extra space?

Not sure if I get it?

The current implementation stores one extra hash table per module, and each hash table will have an entry for every top-level method name.

This doesn't seem that concerning from a memory consumption point of view, but I might be wrong.

d-netto avatar Apr 02 '24 04:04 d-netto

Addressed most of the concerns raised above in the last commits.

In particular:

  • Canonicalized names are no longer using <> or module names. They are using ## to keep consistency with other names for callable types.

Example:

julia> function foo(x)
           function bar(y)
               return x + y
           end
       end
foo (generic function with 1 method)

julia> f = foo(42)
(::var"#bar#foo##0"{Int64}) (generic function with 1 method)
  • Counter table is now local to every module.

  • The parsed method stack is implemented in Scheme.

d-netto avatar Apr 02 '24 15:04 d-netto

Unfortunately I discovered this morning a segfault on this PR:

$ cat src/TestProject.jl
module TestProject
function foo(x::Float64)
    return (y)->x + y
end
end # module TestProject
julia> using TestProject
julia> Core.eval(TestProject, :(function foo(x::Int)
           return (y)->x+y
       end))
[334559] signal 11 (1): Segmentation fault
in expression starting at none:1
ptrhash_get at /home/topolarity/repos/julia/src/support/ptrhash.c:26
unknown function (ip: 0x7fff5166d51f)
Allocations: 5045387 (Pool: 5044906; Big: 481); GC: 6
[1]    334559 segmentation fault  ../../julia/julia --project=. -q

This seems related to an important problem that @vtjnash pointed out to me yesterday. The counter_table field will be lost across any serialization boundaries for a Module, which can cause us to accidentally emit the same typename for multiple different anonymous functions (leading to corruption).

The solution would seem to be either to:

  • Add specific serialization support for counter_table to preserve it across the serialization boundary, OR
  • Check in the binding table directly for type name collisions

That segfault and the serialization problem need to be resolved, but I'm ready to approve once they are

topolarity avatar Apr 10 '24 12:04 topolarity

The solution would seem to be either to:

  • Add specific serialization support for counter_table to preserve it across the serialization boundary, OR
  • Check in the binding table directly for type name collisions

Did a mix of both and lazily reconstructed the counter_table of a freshly serialized module through successive lookups in the binding table.

Performance doesn't seem terrible, but a few parts could probably be optimized.

d-netto avatar Apr 10 '24 17:04 d-netto

I'm not sure whether or not we actually delete bindings right now, but that seems like an unfortunate fragility.

Yes, that's fair. Though if we plan on using the bindings table to save space from the counter_table (as @vtjnash suggested here), then it seems we might suffer from the same issue?

d-netto avatar Apr 11 '24 13:04 d-netto

Bump?

d-netto avatar Apr 22 '24 11:04 d-netto

@JeffBezanson sent his OK on the approach described here.

@vtjnash, @topolarity: any further objections on merging this?

d-netto avatar May 06 '24 17:05 d-netto

:+1: Looks good now, just needs finishing touches.

JeffBezanson avatar Jul 23 '24 20:07 JeffBezanson

Are you able to make those small cleanups soon? It would be nice to get it in before https://github.com/JuliaLang/julia/pull/54788 causes conflicts

vtjnash avatar Aug 15 '24 18:08 vtjnash

Are you able to make those small cleanups soon? It would be nice to get it in before https://github.com/JuliaLang/julia/pull/54788 causes conflicts

Yes. Plan to take a look at this over the weekend.

d-netto avatar Aug 15 '24 19:08 d-netto

I don't think https://github.com/JuliaLang/julia/pull/54788 by itself will cause issues since existence in the binding table is still semantic on that branch.

Keno avatar Aug 15 '24 19:08 Keno

@vtjnash: I believe I addressed the remaining comments.

Any additional requests?

Good to merge?

d-netto avatar Aug 28 '24 18:08 d-netto

Plan to take a look at this over the weekend.

Unfortunately was not able to meet this deadline. Apologies for the delay here.

d-netto avatar Aug 28 '24 18:08 d-netto

Bump?

d-netto avatar Aug 31 '24 16:08 d-netto

Bump?

d-netto avatar Sep 04 '24 15:09 d-netto

Thanks! This took some perseverance @d-netto , but it's time to land

topolarity avatar Sep 06 '24 19:09 topolarity