dmd
                                
                                 dmd copied to clipboard
                                
                                    dmd copied to clipboard
                            
                            
                            
                        do not cache instances of recursive templates
The idea is that if a template recursively calls itself, the only instance that needs to be cached in TemplateDeclaration.instances is the first one.
Thanks for your pull request, @WalterBright!
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + dmd#11820"
I'll look into why this isn't working tomorrow.
This needs an explanation, what's the reasoning behind this?
Here is what is happening:  Error: template instance rt.util.typeinfo.TypeInfoGeneric!(ubyte, ubyte) recursive expansion exceeded allowed nesting limit
template recursion is not the recursion you think of when a function recurs. Since when a template recurs into itself, when when the visible template arguments are the same, the context is not.
This needs an explanation, what's the reasoning behind this?
Better performance?
This needs an explanation, what's the reasoning behind this?
Better performance?
The lower level reasoning. I would expect the performance to actually suffer ... because less instances are reused. So there must be something I am missing.
I'm afraid I can't tell what and where it's failing from any of the log files, except it has something to do with druntime. I can't get druntime to fail on my Linux box.
make[1]: *** [posix.mak:312: generated/freebsd/debug/64/unittest/libdruntime-ut.so] Segmentation fault (core dumped)
Isn't this clear enough? It crashes building druntime unittests.
No, it's not clear whether it was building or running the unittests when it crashed. The log has other things in it like stack traces and aborts that are apparently supposed to happen? Anyhow, I finally was able to repro it locally.
found it!
The Azure problem boils down to this:
template compose(fun ...)
{
    static if (fun.length == 1)
    {
	void foo() { }
	alias compose = foo;
    }
    else
	alias compose = compose!(fun[1]);
}
void test()
{
    compose!(1,1)();
    compose!(2,1)();
}
where two identical COMDATs of compose!(fun[1]) are generated, and the Microsoft linker rejects duplicate COMDATs. However, it works well enough on the other platforms that it can be evaluated for performance.
@WalterBright I am still wondering why you would expect a performance increase by not caching instances? Faster lookups because the hash-table is smaller? Or less memory use because 'internal' instances can be discarded?
Because:
- the hash table can grow very large, which causes lots of memory cache misses.
- the hash table holds on to all the recursive instances, meaning they can't be garbage collected
Whether this is true or not can only be determined by benchmarking one of the huge programs that is causing problems with too many templates.
The only reasonable way I can think of to fix the Azure problem mentioned above is by storing COMDATs that were not put in the hash table into another hash table upon code generation to remove the duplicates, and do it for MSCOFF targets only. But it isn't worth the bother to implement that until we get a thumbs up or down on the benchmark results, which thankfully are on Linux.
Well, another way is to just disable this optimization for mscoffobj.
The Azure problem boils down to this:
template compose(fun ...) { static if (fun.length == 1) { void foo() { } alias compose = foo; } else alias compose = compose!(fun[1]); } void test() { compose!(1,1)(); compose!(2,1)(); }where two identical COMDATs of
compose!(fun[1])are generated, and the Microsoft linker rejects duplicate COMDATs. However, it works well enough on the other platforms that it can be evaluated for performance.
Sending multiple versions of the the same symbol to the backend breaks gdc and ldc too. As you can not have multiple definitions of symbols in assembly.
The way COMDATs are done is multiple COMDATs with the same name are merged into one. This works with every linker except VS's linker.
See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)
That's fine. But if you are outputting textual asm. it's kindof a problem using the same name twice.
Am Di., 6. Okt. 2020 um 10:13 Uhr schrieb Walter Bright < [email protected]>:
See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlang/dmd/pull/11820#issuecomment-704108186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVWSCG6V3VIMKIBDQPLNU3SJLGUDANCNFSM4SBPEQNA .
See https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for for an explanation of COMDATs if you don't believe me :-)
Compile this code if you don't believe me. ;-)
        .globl  _D7pr118204testFZv
        .type   _D7pr118204testFZv, @function
_D7pr118204testFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr118204testFZv, .-_D7pr118204testFZv
        .section        .text._D7pr1182016__T7composeVii1Z3fooFZv,"axG",@progbits,_D7pr1182016__T7composeVii1Z3fooFZv,comdat
        .weak   _D7pr1182016__T7composeVii1Z3fooFZv
        .type   _D7pr1182016__T7composeVii1Z3fooFZv, @function
_D7pr1182016__T7composeVii1Z3fooFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr1182016__T7composeVii1Z3fooFZv, .-_D7pr1182016__T7composeVii1Z3fooFZv
        .section        .text._D7pr1182016__T7composeVii1Z3fooFZv,"axG",@progbits,_D7pr1182016__T7composeVii1Z3fooFZv,comdat
        .weak   _D7pr1182016__T7composeVii1Z3fooFZv
        .type   _D7pr1182016__T7composeVii1Z3fooFZv, @function
_D7pr1182016__T7composeVii1Z3fooFZv:
        .cfi_startproc
        ret
        .cfi_endproc
        .size   _D7pr1182016__T7composeVii1Z3fooFZv, .-_D7pr1182016__T7composeVii1Z3fooFZv
@ibuclaw all I can say is all the linux tests pass with multiple COMDATs in the same .o file.
@ibuclaw all I can say is all the linux tests pass with multiple COMDATs in the same .o file.
They won't with ldc.
They won't with ldc.
Why not? If dmd and ldc are using the same linker, why doesn't it work with ldc?
They won't with ldc.
Why not? If dmd and ldc are using the same linker, why doesn't it work with ldc?
Because gdc and ldc compile to assembly (.s), not to object (.o) file.
@WalterBright are you still pursuing this?