dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Implement some of MSVC's intrinsics, for ImportC.

Open just-harry opened this issue 1 year ago • 13 comments

This PR implements all but three of the MSVC intrinsics listed here: https://web.archive.org/web/20240412171516/https://learn.microsoft.com/en-ie/cpp/intrinsics/alphabetical-listing-of-intrinsic-functions?view=msvc-170, and a handful of undocumented intrinsics.

The three unimplemented intrinsics are _AddressOfReturnAddress, __getcallerseflags, and _ReturnAddress, because they need compiler support for their implementation.

The implemented intrinsics are listed in this expando.
  • __assume
  • __umulh
  • __mulh
  • _umul128
  • _mul128
  • __emul
  • __emulu
  • _div128
  • _udiv128
  • _div64
  • _udiv64
  • __cpuid
  • __cpuidex
  • _cvt_ftoi_fast
  • _cvt_ftoll_fast
  • _cvt_ftoui_fast
  • _cvt_ftoull_fast
  • _cvt_dtoi_fast
  • _cvt_dtoll_fast
  • _cvt_dtoui_fast
  • _cvt_dtoull_fast
  • _cvt_ftoi_sat
  • _cvt_ftoll_sat
  • _cvt_ftoui_sat
  • _cvt_ftoull_sat
  • _cvt_dtoi_sat
  • _cvt_dtoll_sat
  • _cvt_dtoui_sat
  • _cvt_dtoull_sat
  • _cvt_ftoi_sent
  • _cvt_ftoll_sent
  • _cvt_ftoui_sent
  • _cvt_ftoull_sent
  • _cvt_dtoi_sent
  • _cvt_dtoui_sent
  • _cvt_dtoll_sent
  • _cvt_dtoull_sent
  • __readgsbyte
  • __readgsword
  • __readgsdword
  • __readgsqword
  • __writegsbyte
  • __writegsword
  • __writegsdword
  • __writegsqword
  • __addgsbyte
  • __addgsword
  • __addgsdword
  • __addgsqword
  • __incgsbyte
  • __incgsword
  • __incgsdword
  • __incgsqword
  • __readfsbyte
  • __readfsword
  • __readfsdword
  • __readfsqword
  • __writefsbyte
  • __writefsword
  • __writefsdword
  • __writefsqword
  • __addfsbyte
  • __addfsword
  • __addfsdword
  • __incfsbyte
  • __incfsword
  • __incfsdword
  • __debugbreak
  • __fastfail
  • __faststorefence
  • _disable
  • _enable
  • _interlockedadd [Undocumented]
  • _interlockedadd64 [Undocumented]
  • _InterlockedAdd
  • _InterlockedAdd_acq
  • _InterlockedAdd_rel
  • _InterlockedAdd_nf
  • _InterlockedAdd64
  • _InterlockedAdd64_acq
  • _InterlockedAdd64_rel
  • _InterlockedAdd64_nf
  • _InterlockedAddLargeStatistic
  • _InterlockedAnd
  • _InterlockedAnd8
  • _InterlockedAnd16
  • _interlockedand64 [Undocumented]
  • _InterlockedAnd_acq
  • _InterlockedAnd_rel
  • _InterlockedAnd_nf
  • _InterlockedAnd8_acq
  • _InterlockedAnd8_rel
  • _InterlockedAnd8_nf
  • _InterlockedAnd16_acq
  • _InterlockedAnd16_rel
  • _InterlockedAnd16_nf
  • _InterlockedAnd64_acq
  • _InterlockedAnd64_rel
  • _InterlockedAnd64_nf
  • _InterlockedAnd64
  • _InterlockedAnd_np
  • _InterlockedAnd8_np
  • _InterlockedAnd16_np
  • _InterlockedAnd64_np
  • _InterlockedAnd64_HLEAcquire
  • _InterlockedAnd64_HLERelease
  • _InterlockedAnd_HLEAcquire
  • _InterlockedAnd_HLERelease
  • _interlockedbittestandreset
  • _interlockedbittestandreset64
  • _interlockedbittestandreset_HLEAcquire
  • _interlockedbittestandreset_HLERelease
  • _interlockedbittestandreset64_HLEAcquire
  • _interlockedbittestandreset64_HLERelease
  • _interlockedbittestandreset_acq
  • _interlockedbittestandreset_rel
  • _interlockedbittestandreset_nf
  • _interlockedbittestandreset64_acq
  • _interlockedbittestandreset64_rel
  • _interlockedbittestandreset64_nf
  • _interlockedbittestandset
  • _interlockedbittestandset64
  • _interlockedbittestandset_HLEAcquire
  • _interlockedbittestandset_HLERelease
  • _interlockedbittestandset64_HLEAcquire
  • _interlockedbittestandset64_HLERelease
  • _interlockedbittestandset_acq
  • _interlockedbittestandset_rel
  • _interlockedbittestandset_nf
  • _interlockedbittestandset64_acq
  • _interlockedbittestandset64_rel
  • _interlockedbittestandset64_nf
  • _InterlockedCompareExchange
  • _InterlockedCompareExchange8
  • _InterlockedCompareExchange16
  • _InterlockedCompareExchange64
  • _InterlockedCompareExchange_HLEAcquire
  • _InterlockedCompareExchange_HLERelease
  • _InterlockedCompareExchange64_HLEAcquire
  • _InterlockedCompareExchange64_HLERelease
  • _InterlockedCompareExchange_np
  • _InterlockedCompareExchange16_np
  • _InterlockedCompareExchange64_np
  • _InterlockedCompareExchange_acq
  • _InterlockedCompareExchange_rel
  • _InterlockedCompareExchange_nf
  • _InterlockedCompareExchange8_acq
  • _InterlockedCompareExchange8_rel
  • _InterlockedCompareExchange8_nf
  • _InterlockedCompareExchange16_acq
  • _InterlockedCompareExchange16_rel
  • _InterlockedCompareExchange16_nf
  • _InterlockedCompareExchange64_acq
  • _InterlockedCompareExchange64_rel
  • _InterlockedCompareExchange64_nf
  • _InterlockedCompareExchange128
  • _InterlockedCompareExchange128_np
  • _InterlockedCompareExchange128_acq
  • _InterlockedCompareExchange128_rel
  • _InterlockedCompareExchange128_nf
  • _InterlockedCompareExchangePointer
  • _InterlockedCompareExchangePointer_HLEAcquire
  • _InterlockedCompareExchangePointer_HLERelease
  • _InterlockedCompareExchangePointer_np
  • _InterlockedCompareExchangePointer_acq
  • _InterlockedCompareExchangePointer_rel
  • _InterlockedCompareExchangePointer_nf
  • _InterlockedDecrement
  • _InterlockedDecrement16
  • _interlockeddecrement64
  • _InterlockedDecrement64
  • _InterlockedDecrement_acq
  • _InterlockedDecrement_rel
  • _InterlockedDecrement_nf
  • _InterlockedDecrement16_acq
  • _InterlockedDecrement16_rel
  • _InterlockedDecrement16_nf
  • _InterlockedDecrement64_acq
  • _InterlockedDecrement64_rel
  • _InterlockedDecrement64_nf
  • _InterlockedExchange
  • _InterlockedExchange8
  • _InterlockedExchange16
  • _interlockedexchange64
  • _InterlockedExchange64
  • _InterlockedExchange_HLEAcquire
  • _InterlockedExchange_HLERelease
  • _InterlockedExchange64_HLEAcquire
  • _InterlockedExchange64_HLERelease
  • _InterlockedExchange_acq
  • _InterlockedExchange_rel
  • _InterlockedExchange_nf
  • _InterlockedExchange8_acq
  • _InterlockedExchange8_rel
  • _InterlockedExchange8_nf
  • _InterlockedExchange16_acq
  • _InterlockedExchange16_rel
  • _InterlockedExchange16_nf
  • _InterlockedExchange64_acq
  • _InterlockedExchange64_rel
  • _InterlockedExchange64_nf
  • _InterlockedExchangeAdd
  • _InterlockedExchangeAdd8
  • _InterlockedExchangeAdd16
  • _interlockedexchangeadd64 [Undocumented]
  • _InterlockedExchangeAdd64
  • _InterlockedExchangeAdd_HLEAcquire
  • _InterlockedExchangeAdd_HLERelease
  • _InterlockedExchangeAdd64_HLEAcquire
  • _InterlockedExchangeAdd64_HLERelease
  • _InterlockedExchangeAdd_acq
  • _InterlockedExchangeAdd_rel
  • _InterlockedExchangeAdd_nf
  • _InterlockedExchangeAdd8_acq
  • _InterlockedExchangeAdd8_rel
  • _InterlockedExchangeAdd8_nf
  • _InterlockedExchangeAdd16_acq
  • _InterlockedExchangeAdd16_rel
  • _InterlockedExchangeAdd16_nf
  • _InterlockedExchangeAdd64_acq
  • _InterlockedExchangeAdd64_rel
  • _InterlockedExchangeAdd64_nf
  • _InterlockedExchangePointer
  • _InterlockedExchangePointer_HLEAcquire
  • _InterlockedExchangePointer_HLERelease
  • _InterlockedExchangePointer_acq
  • _InterlockedExchangePointer_rel
  • _InterlockedExchangePointer_nf
  • _InterlockedIncrement
  • _InterlockedIncrement16
  • _interlockedincrement64 [Undocumented]
  • _InterlockedIncrement64
  • _InterlockedIncrement_acq
  • _InterlockedIncrement_rel
  • _InterlockedIncrement_nf
  • _InterlockedIncrement16_acq
  • _InterlockedIncrement16_rel
  • _InterlockedIncrement16_nf
  • _InterlockedIncrement64_acq
  • _InterlockedIncrement64_rel
  • _InterlockedIncrement64_nf
  • _InterlockedOr
  • _InterlockedOr8
  • _InterlockedOr16
  • _interlockedor64 [Undocumented]
  • _InterlockedOr_acq
  • _InterlockedOr_rel
  • _InterlockedOr_nf
  • _InterlockedOr8_acq
  • _InterlockedOr8_rel
  • _InterlockedOr8_nf
  • _InterlockedOr16_acq
  • _InterlockedOr16_rel
  • _InterlockedOr16_nf
  • _InterlockedOr64_acq
  • _InterlockedOr64_rel
  • _InterlockedOr64_nf
  • _InterlockedOr64
  • _InterlockedOr_np
  • _InterlockedOr8_np
  • _InterlockedOr16_np
  • _InterlockedOr64_np
  • _InterlockedOr64_HLEAcquire
  • _InterlockedOr64_HLERelease
  • _InterlockedOr_HLEAcquire
  • _InterlockedOr_HLERelease
  • _InterlockedXor
  • _InterlockedXor8
  • _InterlockedXor16
  • _interlockedxor64 [Undocumented]
  • _InterlockedXor_acq
  • _InterlockedXor_rel
  • _InterlockedXor_nf
  • _InterlockedXor8_acq
  • _InterlockedXor8_rel
  • _InterlockedXor8_nf
  • _InterlockedXor16_acq
  • _InterlockedXor16_rel
  • _InterlockedXor16_nf
  • _InterlockedXor64_acq
  • _InterlockedXor64_rel
  • _InterlockedXor64_nf
  • _InterlockedXor64
  • _InterlockedXor_np
  • _InterlockedXor8_np
  • _InterlockedXor16_np
  • _InterlockedXor64_np
  • _InterlockedXor64_HLEAcquire
  • _InterlockedXor64_HLERelease
  • _InterlockedXor_HLEAcquire
  • _InterlockedXor_HLERelease
  • __inbyte
  • __inword
  • __indword
  • __outbyte
  • __outword
  • __outdword
  • __inbytestring
  • __inwordstring
  • __indwordstring
  • __outbytestring
  • __outwordstring
  • __outdwordstring
  • __int2c
  • __invlpg
  • __lidt
  • __ll_lshift
  • __ll_rshift
  • __ull_rshift
  • __lzcnt16
  • __lzcnt
  • __lzcnt64
  • _mm_cvtsi64x_ss
  • _mm_cvtss_si64x
  • _mm_cvttss_si64x
  • _mm_extract_si64
  • _mm_extracti_si64
  • _mm_insert_si64
  • _mm_inserti_si64
  • _mm_stream_sd
  • _mm_stream_ss
  • _mm_stream_si64x
  • __movsb
  • __movsw
  • __movsd
  • __movsq
  • __noop
  • __nop
  • __popcnt16
  • __popcnt
  • __popcnt64
  • __rdtsc
  • __rdtscp
  • __readcr0
  • __readcr2
  • __readcr3
  • __readcr4
  • __readcr8
  • __readdr
  • __readeflags
  • __readmsr
  • __readpmc
  • __segmentlimit
  • __shiftleft128
  • __shiftright128
  • __sidt
  • __stosb
  • __stosw
  • __stosd
  • __stosq
  • __svm_clgi
  • __svm_invlpga
  • __svm_skinit
  • __svm_stgi
  • __svm_vmload
  • __svm_vmrun
  • __svm_vmsave
  • __ud2
  • __vmx_off
  • __vmx_on
  • __vmx_vmclear
  • __vmx_vmlaunch
  • __vmx_vmptrld
  • __vmx_vmptrst
  • __vmx_vmread
  • __vmx_vmresume
  • __vmx_vmwrite
  • __wbinvd
  • __writecr0
  • __writecr2
  • __writecr3
  • __writecr4
  • __writecr8
  • __writedr
  • __writeeflags
  • __writemsr
  • _ReadBarrier
  • _WriteBarrier
  • _ReadWriteBarrier
  • _BitScanForward
  • _BitScanReverse
  • _BitScanForward64
  • _BitScanReverse64
  • _bittest
  • _bittestandcomplement
  • _bittestandreset
  • _bittestandset
  • _bittest64
  • _bittestandcomplement64
  • _bittestandreset64
  • _bittestandset64
  • _byteswap_uint64
  • _byteswap_ulong
  • _byteswap_ushort
  • _lrotr
  • _lrotl
  • _rotr
  • _rotl
  • _rotr64
  • _rotl64
  • _rotr16
  • _rotl16
  • _rotr8
  • _rotl8

These implementations aim to be as compatible with the MSVC intrinsics as is possible—adhering to Hyrum's Law.

Separate implementations are provided for DMD, LDC, and GDC, and for x86, x86-64, AArch64, and ARM. Almost all the intrinsics are implemented in D, except for __assume which is a C macro.

Every intrinsic, where possible, has a CTFE-compatible code-path.
It all compiles with or without DIP1000 being enabled.
Care has been taken to ensure that none of the implementations rely on DRuntime, so that these work in BetterC.


Regarding the _cvt_ family of functions: by default MSVC will generate code that uses SSE2 instructions, even for 32-bit targets, which means that for 32-bit targets the _cvt_ functions will use SSE2. This is contrary to DMD's usual behaviour of using x87 for 32-bit Windows. For their reimplementations, I've used SSE2 anyway for 32-bit targets for DMD, as doing otherwise would constitute a change in behaviour, as x87 FP-exceptions are different from SIMD FP-exceptions (and, I think the SSE2 and x87 versions return different results). The oldest of the _cvt_ functions was introduced in the May of 2021, so I think it's almost certain that any code using them will be targeting at-least SSE2 anyway.

Additionally, I've written a program that tests that the _cvt_ implementations return identical results to the MSVC implementations for all float values, and for ~402,653,184 double values (except for _cvt_ftoi_fast and _cvt_ftoi_sent on 32-bit targets, as they cause an internal compiler error in MSVC); it also tests that ctfeX86RoundLongToFloat and ctfeX86RoundFloatToLong produce the same results as the hardware.
It relies on Phobos and MSVC, so I don't really know what to do with it other than link to it here: https://github.com/just-harry/float-fuzzing-for-msvc-intrinsics


A few of the intrinsics can be used only in kernel-mode, so unittests have been omitted for them, as I don't think we have any infrastructure in place for testing in kernel-mode.

Some of the intrinsics terminate the program, so their unittests have been wrapped in a version (none), others rely on specific compiler optimisations, so they too have been wrapped in a version (none).


I've split the intrinsics up into a few dozen commits to try and alleviate the whole 13,000-lines-of-codes-all-at-once thing.
(I staged them after-the-fact, so a few braces may be out-of-place in the intermediate commits.)


The intrinsics have been placed in their own files, separate from the existing ImportC builtin files, to try and avoid crowding the builtin files.

Currently, the ImportC builtins are imported conditionally, based on some heuristics. One of those heuristics is if any identifier beginning with two underscores is used – I've changed that one to instead trigger on a single leading underscore as many of MSVC's intrinsics begin with only one underscore.


One notable header that can be successfully included by ImportC, with these intrinsics implemented, is windows.h.


P.S. If this is outside the purview of ImportC: that's fine, I'll just publish this as a library instead.

just-harry avatar Apr 13 '24 01:04 just-harry

I've flipped this back to a draft PR as currently the functions in the __builtins_msvc module can be used only when the -i switch is supplied on the command-line. I think it might be that semantic3 isn't getting run on that module's functions when -i isn't supplied—but I'm unsure.

just-harry avatar Apr 15 '24 20:04 just-harry

This is still a draft as I'm unable to get the compiler to actually generate code for the __builtins_msvc module without its path being supplied on the command-line.


Presently, neither the __builtins module nor the __builtins_msvc module make their way into the array of modules passed into generateCodeAndWrite, which is fine for __builtins, but not for __builtins_msvc because of how the standard Windows headers use the builtins, and because DMD doesn't inline some of the asm-using functions.

This is effectively how winnt.h declares UnsignedMultiply128:

unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);

#define UnsignedMultiply128 _umul128

unsigned long long UnsignedMultiply128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);

which will expand to unsigned long long _umul128 (unsigned long long Multiplier, unsigned long long Multiplicand, unsigned long long *HighProduct);, causing the linker to expect code for _umul128.

But even if the Windows headers didn't do that, we still need code to be generated for _umul128, because DMD won't inline _umul128's call to multiplyWithDoubleWidthProduct and the linker will expect code for it.

(_umul128 isn't the only function affected.)


Right now I'm stuck, because I've yet to figure out a good way to get the __builtins_msvc module into the modules array passed into generateCodeAndWrite after it's conditionally imported if any MSVC intrinsics are detected.

The best I've come up with thus far is very hacky, and would require the same changes to be made in LDC and GDC (unless I'm mistaken), which is: in Module.parseModule, if the module identifier is __builtins_msvc, set the global includeImports to true, and if it was previously false set includeByDefault to false, then add __builtins_msvc to the compiledImports array.

In short, my question is: is there an existing way to make it so the __builtins_msvc module winds up getting passed to generateCodeAndWrite after it gets imported by CParser.parseModule? And, if not, is there an acceptably non-hacky way to thread it through the compiler without requiring changes to LDC and GDC?

Any suggestions would be appreciated.

just-harry avatar Apr 17 '24 00:04 just-harry

I suggest having the user do the #include line as the first line in his C code. Or have the user pass it in with the dmd switch that passes switches to the C preprocessor. Will that work?

WalterBright avatar Apr 23 '24 04:04 WalterBright

@WalterBright,

I suggest having the user do the #include line as the first line in his C code. Or have the user pass it in with the dmd switch that passes switches to the C preprocessor. Will that work?

If that works for you, it works for me. I don't see an issue with:

#include <importc_msvc_builtins.h>
#include <windows.h>

I've added some code to supply <path>/importc.h/../include to the preprocessor as an include path, so that the user doesn't have to hunt down the path themselves. (https://github.com/dlang/dmd/pull/16372/commits/de631404562ee6ba353d335efda8c5905191b7b3)


Though, the issue of these builtins only working when the -i switch is used still remains.

Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

just-harry avatar Apr 23 '24 22:04 just-harry

Okay; a separate PR adding a mechanism that this PR can use to force codegen for the __builtins_msvc module: https://github.com/dlang/dmd/pull/16443

Via something like this in cparse.d:

         if (token.value == TOK._import) // import declaration extension
         {
             auto a = parseImport();
             if (a && a.length)
/*+*/        {
/*+*/            auto imp = (*a)[0].isImport();
/*+*/            imp.forceCodegen = imp.id == Id.builtins_msvc;
                 symbols.append(a);
/*+*/        }
             return;
         }

just-harry avatar May 05 '24 00:05 just-harry

Though, the issue of these builtins only working when the -i switch is used still remains.

Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

Did you try converting these functions to templates?

ibuclaw avatar May 05 '24 05:05 ibuclaw

Though, the issue of these builtins only working when the -i switch is used still remains. Is there an existing way to force codegen for an imported module, when the -i switch isn't used?

Did you try converting these functions to templates?

I did—but no dice, unfortunately, even with all the affected functions being templates, the simplest possible usage still fails to link:

// windows_h_c.c:
#include <importc_msvc_builtins.h>
#include <windows.h>

// windows_h.d:
pragma(lib, "MinCore");
import windows_h_c;
windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd referenced in function _InlineInterlockedAdd
windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd64 referenced in function _InlineInterlockedAdd64
windows_h.obj : error LNK2019: unresolved external symbol _mul128 referenced in function MultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol __shiftright128 referenced in function MultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol _umul128 referenced in function UnsignedMultiplyExtract128
windows_h.obj : error LNK2019: unresolved external symbol _ReadWriteBarrier referenced in function BarrierAfterRead
windows_h.obj : error LNK2019: unresolved external symbol __stosb referenced in function RtlSecureZeroMemory
windows_h.obj : error LNK2019: unresolved external symbol __readgsqword referenced in function NtCurrentTeb
windows_h.exe : fatal error LNK1120: 8 unresolved externals

just-harry avatar May 05 '24 06:05 just-harry

Just have the user add the #include line.

WalterBright avatar May 05 '24 06:05 WalterBright

Just have the user add the #include line.

Perhaps I'm misunderstanding you, but is that not what the current implementation is doing? In the example here: https://github.com/dlang/dmd/pull/16372#issuecomment-2094644837

There, the user #includes <importc_msvc_builtins.h> before <windows.h>, which works perfectly well. importc_msvc_builtins.h then does an __import __builtins_msvc; The import of __builtins_msvc succeeds because druntime/import is in the compiler's import path. But because the path to __builtins_msvc.d wasn't explicitly provided on the command-line, codegen doesn't take place for the __builtins_msvc module. Which ultimately causes the linker to fail because the way the standard Windows headers uses some of these intrinsics causes the linker to expect a definition for them.

just-harry avatar May 05 '24 06:05 just-harry

windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd referenced in function _InlineInterlockedAdd

I can try to synthesize this later (Linux). But it would be useful to know how these symbols are being referenced in the windows headers.

My immediate suspicion is either the address is taken, or windows.h adds its own extern declaration outside of importc msvc builtins.

ibuclaw avatar May 05 '24 06:05 ibuclaw

windows_h.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAdd referenced in function _InlineInterlockedAdd

I can try to synthesize this later (Linux). But it would be useful to know how these symbols are being referenced in the windows headers.

My immediate suspicion is either the address is taken, or windows.h adds its own extern declaration outside of importc msvc builtins.

I think it's the latter. For example, for __shiftright128 in winnt.h, there's

#define ShiftRight128 __shiftright128
DWORD64
ShiftRight128 (
    _In_ DWORD64 LowPart,
    _In_ DWORD64 HighPart,
    _In_ BYTE  Shift
    );
#pragma intrinsic(__shiftright128)

Similarly, for _mul128 in winnt.h:

#define Multiply128 _mul128
LONG64
Multiply128 (
    _In_ LONG64 Multiplier,
    _In_ LONG64 Multiplicand,
    _Out_ LONG64 *HighProduct
    );
#pragma intrinsic(_mul128)

The same pattern is used for the rest of affected functions, in winnt.h; with the minor difference of no #define being used for _ReadWriteBarrier, __stosb, or __readgsqword.

just-harry avatar May 05 '24 06:05 just-harry

But because the path to __builtins_msvc.d wasn't explicitly provided on the command-line, codegen doesn't take place for the __builtins_msvc module.

For now I suggest just asking the user to put it on the command line.

WalterBright avatar May 05 '24 06:05 WalterBright

But because the path to __builtins_msvc.d wasn't explicitly provided on the command-line, codegen doesn't take place for the __builtins_msvc module.

For now I suggest just asking the user to put it on the command line.

If it comes down to it, yeah okay. But I don't think it makes for a great user-experience, and the point of ImportC is to reduce friction, so I'd like to try one last thing.

With an addition to ImportC, we can make it so these intrinsics can be inlined, and so that they can be templates (removing the need for unconditional codegen for the __builtins_msvc module)—improving runtime performance, reducing compile-times, and making them easier to use: https://github.com/dlang/dmd/pull/16464

just-harry avatar May 09 '24 18:05 just-harry