Deeply recursive types highlight performance issue in sysimg creation, type_in_worklist is inefficient.
I have been experiencing long delays between finishing a precompilation workload (using PrecompileTools.jl) and exiting precompilation. In the full-scale example, the numbers were
- First call to the function without PrecompileTools: ~40seconds
- First call to the function during precompilation: ~20seconds
- Time between the function returning and precompilation finishing (start of test suite): ~1200seconds
I have been able to reproduce the issue in a small package: https://github.com/simsurace/WeirdPrecompilationProblem.jl The numbers are much smaller for the MRE, but there is still a large delay of approx. 2minutes between the function returning and precompilation finishing on all of my machines (Linux and MacOS).
This is a follow-up from a discussion with @vchuravy on Slack.
I assume this is 1.10?
Yes, the Manifest is included in the package.
We spend a lot of time in sysimg_dump.
Most of it is in:
cc: @timholy and @vtjnash since this is cbfdb3f
Sure, someone could throw an htable cache around that to avoid duplicate work on deep trees
Any pointers to where the htable would need to be inserted? My hunch would be to use it to save the results of has_backedge_to_worklist, hashing by method instance.
type_in_worklist would need to memoize it. The performance issue here is likely basically that we check has_backedge_to_worklist on the outermost wrapper, which then queues that object to be serialized into the hashmap of work, and recurses to check that property again on all of the content, queueing all of those objects as it goes along
@vtjnash would you insert the cache into type_in_worklist or provide a version that memoizes on the outer layer?
@simsurace the latter would look roughly like this, and then one would need to thread the cache through.
**static int type_in_worklist_cached(htable *cache, jl_value_t *v)
{
void **bp = ptrhash_bp(cache, (void*)v);
// HT_NOTFOUND: not yet analyzed
// HT_NOTFOUND + 1: not in worklist
// HT_NOTFOUND + 2: in worklist
int found = (char*)*bp - (char*)HT_NOTFOUND;
if (found == 0) {
found = type_in_worklist(v) + 1;
*bp = (void*)((char*)HT_NOTFOUND + found);
}
return found - 1;
}
I think every type_in_worklist call needs to be augmented to memoize the result