AMDMIGraphX
AMDMIGraphX copied to clipboard
Fix targets registration on Windows
Unfortunately, automatic target registration does not work on Windows.
The linker will not add a specified DLL as a dependency if at least a symbol from the dynamic library is not used. Therefore, linking to migrpahx_all_targets
does not link dynamic libraries on Windows as intended.
This PR changes automatic registration to runtime manual registration. Registration is done when the target library is loaded by the register_target()
function exported from the target library. It is called from the migraphx::make_target()
function.
Can we just use the -Wl,--no-as-needed
flag instead on windows?
Also, we could make migraphx_all_targets
a static library and then have it load the all targets before main
:
__attribute__((constructor)) void load_targets()
{
make_target("ref");
#ifdef HAVE_CPU
make_target("cpu");
#endif
#ifdef HAVE_GPU
make_target("gpu");
#endif
#ifdef HAVE_FPGA
make_target("fpga");
#endif
}
Can we just use the
-Wl,--no-as-needed
flag instead on windows?
It is not supported on Windows. It is only applicable to shared objects.
Also, we could make
migraphx_all_targets
a static library and then have it load the all targets beforemain
:__attribute__((constructor)) void load_targets() { make_target("ref"); #ifdef HAVE_CPU make_target("cpu"); #endif #ifdef HAVE_GPU make_target("gpu"); #endif #ifdef HAVE_FPGA make_target("fpga"); #endif }
This will now work on Windows. DLLs are not the same as shared objects. Furthermore, MSVC/Clang on Windows does not recognize the constructor attribute because Windows does not support it. One must implement DllMain() to serve a similar purpose. However, that is more complicated because it is process-attached and thread-attached aware—the auto registration of a target creates thread-local storage, making it unavailable after exiting from a library loading thread. A DLL on Windows should generate and return an execution context to a caller to store the per-process state.
I managed to simplify the PR and limit the number of required changes.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.97%. Comparing base (
354bc02
) to head (165d0ec
). Report is 141 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #2866 +/- ##
========================================
Coverage 91.97% 91.97%
========================================
Files 489 489
Lines 19390 19390
========================================
Hits 17833 17833
Misses 1557 1557
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Test | Batch | Rate new 165d0e |
Rate old 206e0f |
Diff | Compare |
---|---|---|---|---|---|
torchvision-resnet50 | 64 | 1,741.50 | 1,750.99 | -0.54% | :white_check_mark: |
torchvision-resnet50_fp16 | 64 | 4,065.27 | 4,085.88 | -0.50% | :white_check_mark: |
torchvision-densenet121 | 32 | 1,458.17 | 1,465.87 | -0.52% | :white_check_mark: |
torchvision-densenet121_fp16 | 32 | 2,516.42 | 2,524.65 | -0.33% | :white_check_mark: |
torchvision-inceptionv3 | 32 | 885.24 | 890.07 | -0.54% | :white_check_mark: |
torchvision-inceptionv3_fp16 | 32 | 1,477.42 | 1,483.80 | -0.43% | :white_check_mark: |
cadene-inceptionv4 | 16 | 410.35 | 412.53 | -0.53% | :white_check_mark: |
cadene-resnext64x4 | 16 | 417.51 | 419.70 | -0.52% | :white_check_mark: |
slim-mobilenet | 64 | 3,986.60 | 4,007.04 | -0.51% | :white_check_mark: |
slim-nasnetalarge | 64 | 100.54 | 101.00 | -0.46% | :white_check_mark: |
slim-resnet50v2 | 64 | 1,671.94 | 1,681.03 | -0.54% | :white_check_mark: |
bert-mrpc-onnx | 8 | 612.66 | 615.98 | -0.54% | :white_check_mark: |
bert-mrpc-tf | 1 | 276.92 | 279.26 | -0.84% | :white_check_mark: |
pytorch-examples-wlang-gru | 1 | 321.22 | 324.15 | -0.90% | :white_check_mark: |
pytorch-examples-wlang-lstm | 1 | 290.98 | 327.92 | -11.26% | :red_circle: |
torchvision-resnet50_1 | 1 | 467.81 | 469.45 | -0.35% | :white_check_mark: |
cadene-dpn92_1 | 1 | 246.86 | 247.24 | -0.15% | :white_check_mark: |
cadene-resnext101_1 | 1 | 203.25 | 203.51 | -0.13% | :white_check_mark: |
onnx-taau-downsample | 1 | 205.51 | 206.27 | -0.37% | :white_check_mark: |
dlrm-criteoterabyte | 1 | 22.82 | 22.91 | -0.39% | :white_check_mark: |
dlrm-criteoterabyte_fp16 | 1 | 42.56 | 42.74 | -0.43% | :white_check_mark: |
agentmodel | 1 | 6,340.55 | 6,392.31 | -0.81% | :white_check_mark: |
unet_fp16 | 2 | 34.04 | 34.21 | -0.49% | :white_check_mark: |
resnet50v1_fp16 | 1 | 585.28 | 597.26 | -2.01% | :white_check_mark: |
resnet50v1_int8 | 1 | 577.83 | 572.90 | 0.86% | :white_check_mark: |
bert_base_cased_fp16 | 64 | 642.42 | 646.37 | -0.61% | :white_check_mark: |
bert_large_uncased_fp16 | 32 | 197.82 | 199.03 | -0.61% | :white_check_mark: |
bert_large_fp16 | 1 | 116.83 | 117.51 | -0.58% | :white_check_mark: |
distilgpt2_fp16 | 16 | 1,204.72 | 1,211.72 | -0.58% | :white_check_mark: |
yolov5s | 1 | 300.61 | 302.25 | -0.54% | :white_check_mark: |
tinyllama | 1 | 23.21 | 23.34 | -0.55% | :white_check_mark: |
vicuna-fastchat | 1 | 133.45 | 134.18 | -0.54% | :white_check_mark: |
whisper-tiny-encoder | 1 | 243.16 | 244.17 | -0.41% | :white_check_mark: |
whisper-tiny-decoder | 1 | 255.26 | 256.68 | -0.55% | :white_check_mark: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
Furthermore, MSVC/Clang on Windows does not recognize the constructor attribute because Windows does not support it.
Sure, we can write it using standard C++ as well:
struct auto_load_targets
{
auto_load_targets()
{
make_target("ref");
#ifdef HAVE_CPU
make_target("cpu");
#endif
#ifdef HAVE_GPU
make_target("gpu");
#endif
#ifdef HAVE_FPGA
make_target("fpga");
#endif
}
};
static auto load_targets = auto_load_targets();
And this does work on windows because that is how we auto register unit tests.
One must implement DllMain() to serve a similar purpose.
Why would we call DllMain
? This is not run from a dll. This is a static library being linked into an executable.
@pfultz2 it is done
@causten CodeCov reposts a 0.01% test coverage decrease. However, when I look into the pull request details, it says that the patch is 100% covered. Am I reading it wrong?