julia icon indicating copy to clipboard operation
julia copied to clipboard

Deeply recursive types highlight performance issue in sysimg creation, type_in_worklist is inefficient.

Open simsurace opened this issue 1 year ago • 8 comments

I have been experiencing long delays between finishing a precompilation workload (using PrecompileTools.jl) and exiting precompilation. In the full-scale example, the numbers were

  • First call to the function without PrecompileTools: ~40seconds
  • First call to the function during precompilation: ~20seconds
  • Time between the function returning and precompilation finishing (start of test suite): ~1200seconds

I have been able to reproduce the issue in a small package: https://github.com/simsurace/WeirdPrecompilationProblem.jl The numbers are much smaller for the MRE, but there is still a large delay of approx. 2minutes between the function returning and precompilation finishing on all of my machines (Linux and MacOS).

This is a follow-up from a discussion with @vchuravy on Slack.

simsurace avatar Feb 14 '24 14:02 simsurace

I assume this is 1.10?

vchuravy avatar Feb 14 '24 15:02 vchuravy

Yes, the Manifest is included in the package.

simsurace avatar Feb 14 '24 15:02 simsurace

image

We spend a lot of time in sysimg_dump.

Most of it is in:

image

vchuravy avatar Feb 14 '24 16:02 vchuravy

image

vchuravy avatar Feb 14 '24 16:02 vchuravy

cc: @timholy and @vtjnash since this is cbfdb3f

vchuravy avatar Feb 14 '24 16:02 vchuravy

Sure, someone could throw an htable cache around that to avoid duplicate work on deep trees

vtjnash avatar Feb 14 '24 17:02 vtjnash

Any pointers to where the htable would need to be inserted? My hunch would be to use it to save the results of has_backedge_to_worklist, hashing by method instance.

simsurace avatar Feb 15 '24 09:02 simsurace

type_in_worklist would need to memoize it. The performance issue here is likely basically that we check has_backedge_to_worklist on the outermost wrapper, which then queues that object to be serialized into the hashmap of work, and recurses to check that property again on all of the content, queueing all of those objects as it goes along

vtjnash avatar Feb 16 '24 16:02 vtjnash

@vtjnash would you insert the cache into type_in_worklist or provide a version that memoizes on the outer layer?

@simsurace the latter would look roughly like this, and then one would need to thread the cache through.

**static int type_in_worklist_cached(htable *cache, jl_value_t *v)
{
    void **bp = ptrhash_bp(cache, (void*)v);
    // HT_NOTFOUND: not yet analyzed
    // HT_NOTFOUND + 1: not in worklist
    // HT_NOTFOUND + 2: in worklist 
    int found = (char*)*bp - (char*)HT_NOTFOUND;
    if (found == 0) {
        found = type_in_worklist(v) + 1;
        *bp = (void*)((char*)HT_NOTFOUND + found);
    }
    return found - 1;
}

vchuravy avatar Mar 13 '24 15:03 vchuravy

I think every type_in_worklist call needs to be augmented to memoize the result

vtjnash avatar Mar 13 '24 16:03 vtjnash