dmd do not cache instances of recursive templates

The idea is that if a template recursively calls itself, the only instance that needs to be cached in TemplateDeclaration.instances is the first one.

Oct 02 '20 09:10 WalterBright

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11820"

Oct 02 '20 09:10 dlang-bot

I'll look into why this isn't working tomorrow.

Oct 02 '20 09:10 WalterBright

This needs an explanation, what's the reasoning behind this?

Oct 02 '20 11:10 UplinkCoder

Here is what is happening: Error: template instance rt.util.typeinfo.TypeInfoGeneric!(ubyte, ubyte) recursive expansion exceeded allowed nesting limit

template recursion is not the recursion you think of when a function recurs. Since when a template recurs into itself, when when the visible template arguments are the same, the context is not.

Oct 02 '20 11:10 UplinkCoder

This needs an explanation, what's the reasoning behind this?

Better performance?

Oct 02 '20 13:10 nordlow

This needs an explanation, what's the reasoning behind this?

Better performance?

The lower level reasoning. I would expect the performance to actually suffer ... because less instances are reused. So there must be something I am missing.

Oct 02 '20 18:10 UplinkCoder

I'm afraid I can't tell what and where it's failing from any of the log files, except it has something to do with druntime. I can't get druntime to fail on my Linux box.

Oct 03 '20 05:10 WalterBright

make[1]: *** [posix.mak:312: generated/freebsd/debug/64/unittest/libdruntime-ut.so] Segmentation fault (core dumped)

Isn't this clear enough? It crashes building druntime unittests.

Oct 03 '20 07:10 rainers

No, it's not clear whether it was building or running the unittests when it crashed. The log has other things in it like stack traces and aborts that are apparently supposed to happen? Anyhow, I finally was able to repro it locally.

Oct 03 '20 07:10 WalterBright

found it!

Oct 04 '20 06:10 WalterBright

The Azure problem boils down to this:

template compose(fun ...)
{
    static if (fun.length == 1)
    {
	void foo() { }
	alias compose = foo;
    }
    else
	alias compose = compose!(fun[1]);
}

void test()
{
    compose!(1,1)();
    compose!(2,1)();
}

where two identical COMDATs of compose!(fun[1]) are generated, and the Microsoft linker rejects duplicate COMDATs. However, it works well enough on the other platforms that it can be evaluated for performance.

Oct 04 '20 08:10 WalterBright

@WalterBright I am still wondering why you would expect a performance increase by not caching instances? Faster lookups because the hash-table is smaller? Or less memory use because 'internal' instances can be discarded?

Oct 04 '20 08:10 UplinkCoder

Because:

the hash table can grow very large, which causes lots of memory cache misses.
the hash table holds on to all the recursive instances, meaning they can't be garbage collected

Whether this is true or not can only be determined by benchmarking one of the huge programs that is causing problems with too many templates.

Oct 04 '20 08:10 WalterBright

The only reasonable way I can think of to fix the Azure problem mentioned above is by storing COMDATs that were not put in the hash table into another hash table upon code generation to remove the duplicates, and do it for MSCOFF targets only. But it isn't worth the bother to implement that until we get a thumbs up or down on the benchmark results, which thankfully are on Linux.

Oct 04 '20 08:10 WalterBright

Well, another way is to just disable this optimization for mscoffobj.

Oct 05 '20 06:10 WalterBright

The Azure problem boils down to this:
template compose(fun ...)
{
    static if (fun.length == 1)
    {
	void foo() { }
	alias compose = foo;
    }
    else
	alias compose = compose!(fun[1]);
}

void test()
{
    compose!(1,1)();
    compose!(2,1)();
}
where two identical COMDATs of compose!(fun[1]) are generated, and the Microsoft linker rejects duplicate COMDATs. However, it works well enough on the other platforms that it can be evaluated for performance.

Sending multiple versions of the the same symbol to the backend breaks gdc and ldc too. As you can not have multiple definitions of symbols in assembly.

Oct 06 '20 07:10 ibuclaw

The way COMDATs are done is multiple COMDATs with the same name are merged into one. This works with every linker except VS's linker.

Oct 06 '20 08:10 WalterBright

See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)

Oct 06 '20 08:10 WalterBright

That's fine. But if you are outputting textual asm. it's kindof a problem using the same name twice.

Am Di., 6. Okt. 2020 um 10:13 Uhr schrieb Walter Bright < [email protected]>:

See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlang/dmd/pull/11820#issuecomment-704108186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVWSCG6V3VIMKIBDQPLNU3SJLGUDANCNFSM4SBPEQNA .

Oct 06 '20 08:10 UplinkCoder

See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)

Compile this code if you don't believe me. ;-)

        .globl  _D7pr118204testFZv
        .type   _D7pr118204testFZv, @function
_D7pr118204testFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr118204testFZv, .-_D7pr118204testFZv
        .section        .text._D7pr1182016__T7composeVii1Z3fooFZv,"axG",@progbits,_D7pr1182016__T7composeVii1Z3fooFZv,comdat
        .weak   _D7pr1182016__T7composeVii1Z3fooFZv
        .type   _D7pr1182016__T7composeVii1Z3fooFZv, @function
_D7pr1182016__T7composeVii1Z3fooFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr1182016__T7composeVii1Z3fooFZv, .-_D7pr1182016__T7composeVii1Z3fooFZv
        .section        .text._D7pr1182016__T7composeVii1Z3fooFZv,"axG",@progbits,_D7pr1182016__T7composeVii1Z3fooFZv,comdat
        .weak   _D7pr1182016__T7composeVii1Z3fooFZv
        .type   _D7pr1182016__T7composeVii1Z3fooFZv, @function
_D7pr1182016__T7composeVii1Z3fooFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr1182016__T7composeVii1Z3fooFZv, .-_D7pr1182016__T7composeVii1Z3fooFZv

Oct 06 '20 10:10 ibuclaw

@ibuclaw all I can say is all the linux tests pass with multiple COMDATs in the same .o file.

Oct 06 '20 19:10 WalterBright

@ibuclaw all I can say is all the linux tests pass with multiple COMDATs in the same .o file.

They won't with ldc.

Oct 07 '20 10:10 atilaneves

They won't with ldc.

Why not? If dmd and ldc are using the same linker, why doesn't it work with ldc?

Oct 08 '20 04:10 WalterBright

They won't with ldc.

Why not? If dmd and ldc are using the same linker, why doesn't it work with ldc?

Because gdc and ldc compile to assembly (.s), not to object (.o) file.

Oct 08 '20 13:10 ibuclaw

@WalterBright are you still pursuing this?

Jan 07 '21 15:01 RazvanN7

dmd dmd copied to clipboard

do not cache instances of recursive templates

Bugzilla references

Testing this PR locally

dmd
dmd copied to clipboard