AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Fix targets registration on Windows

Open apwojcik opened this issue 11 months ago • 10 comments

Unfortunately, automatic target registration does not work on Windows. The linker will not add a specified DLL as a dependency if at least a symbol from the dynamic library is not used. Therefore, linking to migrpahx_all_targets does not link dynamic libraries on Windows as intended.

This PR changes automatic registration to runtime manual registration. Registration is done when the target library is loaded by the register_target() function exported from the target library. It is called from the migraphx::make_target() function.

apwojcik avatar Mar 07 '24 15:03 apwojcik

Can we just use the -Wl,--no-as-needed flag instead on windows?

pfultz2 avatar Mar 07 '24 17:03 pfultz2

Also, we could make migraphx_all_targets a static library and then have it load the all targets before main:

__attribute__((constructor)) void load_targets()
{
            make_target("ref");
    #ifdef HAVE_CPU
            make_target("cpu");
    #endif
    #ifdef HAVE_GPU
            make_target("gpu");
    #endif
    #ifdef HAVE_FPGA
            make_target("fpga");
    #endif
}

pfultz2 avatar Mar 07 '24 17:03 pfultz2

Can we just use the -Wl,--no-as-needed flag instead on windows?

It is not supported on Windows. It is only applicable to shared objects.

apwojcik avatar Mar 07 '24 21:03 apwojcik

Also, we could make migraphx_all_targets a static library and then have it load the all targets before main:

__attribute__((constructor)) void load_targets()
{
            make_target("ref");
    #ifdef HAVE_CPU
            make_target("cpu");
    #endif
    #ifdef HAVE_GPU
            make_target("gpu");
    #endif
    #ifdef HAVE_FPGA
            make_target("fpga");
    #endif
}

This will now work on Windows. DLLs are not the same as shared objects. Furthermore, MSVC/Clang on Windows does not recognize the constructor attribute because Windows does not support it. One must implement DllMain() to serve a similar purpose. However, that is more complicated because it is process-attached and thread-attached aware—the auto registration of a target creates thread-local storage, making it unavailable after exiting from a library loading thread. A DLL on Windows should generate and return an execution context to a caller to store the per-process state.

apwojcik avatar Mar 07 '24 21:03 apwojcik

I managed to simplify the PR and limit the number of required changes.

apwojcik avatar Mar 07 '24 21:03 apwojcik

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.97%. Comparing base (354bc02) to head (165d0ec). Report is 141 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2866   +/-   ##
========================================
  Coverage    91.97%   91.97%           
========================================
  Files          489      489           
  Lines        19390    19390           
========================================
  Hits         17833    17833           
  Misses        1557     1557           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 07 '24 22:03 codecov[bot]

Test Batch Rate new
165d0e
Rate old
206e0f
Diff Compare
torchvision-resnet50 64 1,741.50 1,750.99 -0.54% :white_check_mark:
torchvision-resnet50_fp16 64 4,065.27 4,085.88 -0.50% :white_check_mark:
torchvision-densenet121 32 1,458.17 1,465.87 -0.52% :white_check_mark:
torchvision-densenet121_fp16 32 2,516.42 2,524.65 -0.33% :white_check_mark:
torchvision-inceptionv3 32 885.24 890.07 -0.54% :white_check_mark:
torchvision-inceptionv3_fp16 32 1,477.42 1,483.80 -0.43% :white_check_mark:
cadene-inceptionv4 16 410.35 412.53 -0.53% :white_check_mark:
cadene-resnext64x4 16 417.51 419.70 -0.52% :white_check_mark:
slim-mobilenet 64 3,986.60 4,007.04 -0.51% :white_check_mark:
slim-nasnetalarge 64 100.54 101.00 -0.46% :white_check_mark:
slim-resnet50v2 64 1,671.94 1,681.03 -0.54% :white_check_mark:
bert-mrpc-onnx 8 612.66 615.98 -0.54% :white_check_mark:
bert-mrpc-tf 1 276.92 279.26 -0.84% :white_check_mark:
pytorch-examples-wlang-gru 1 321.22 324.15 -0.90% :white_check_mark:
pytorch-examples-wlang-lstm 1 290.98 327.92 -11.26% :red_circle:
torchvision-resnet50_1 1 467.81 469.45 -0.35% :white_check_mark:
cadene-dpn92_1 1 246.86 247.24 -0.15% :white_check_mark:
cadene-resnext101_1 1 203.25 203.51 -0.13% :white_check_mark:
onnx-taau-downsample 1 205.51 206.27 -0.37% :white_check_mark:
dlrm-criteoterabyte 1 22.82 22.91 -0.39% :white_check_mark:
dlrm-criteoterabyte_fp16 1 42.56 42.74 -0.43% :white_check_mark:
agentmodel 1 6,340.55 6,392.31 -0.81% :white_check_mark:
unet_fp16 2 34.04 34.21 -0.49% :white_check_mark:
resnet50v1_fp16 1 585.28 597.26 -2.01% :white_check_mark:
resnet50v1_int8 1 577.83 572.90 0.86% :white_check_mark:
bert_base_cased_fp16 64 642.42 646.37 -0.61% :white_check_mark:
bert_large_uncased_fp16 32 197.82 199.03 -0.61% :white_check_mark:
bert_large_fp16 1 116.83 117.51 -0.58% :white_check_mark:
distilgpt2_fp16 16 1,204.72 1,211.72 -0.58% :white_check_mark:
yolov5s 1 300.61 302.25 -0.54% :white_check_mark:
tinyllama 1 23.21 23.34 -0.55% :white_check_mark:
vicuna-fastchat 1 133.45 134.18 -0.54% :white_check_mark:
whisper-tiny-encoder 1 243.16 244.17 -0.41% :white_check_mark:
whisper-tiny-decoder 1 255.26 256.68 -0.55% :white_check_mark:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Mar 08 '24 03:03 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Mar 08 '24 03:03 migraphx-bot

Furthermore, MSVC/Clang on Windows does not recognize the constructor attribute because Windows does not support it.

Sure, we can write it using standard C++ as well:

struct auto_load_targets
{
    auto_load_targets()
    {
        make_target("ref");
#ifdef HAVE_CPU
        make_target("cpu");
#endif
#ifdef HAVE_GPU
        make_target("gpu");
#endif
#ifdef HAVE_FPGA
        make_target("fpga");
#endif
    }
};

static auto load_targets = auto_load_targets();

And this does work on windows because that is how we auto register unit tests.

One must implement DllMain() to serve a similar purpose.

Why would we call DllMain? This is not run from a dll. This is a static library being linked into an executable.

pfultz2 avatar Mar 08 '24 17:03 pfultz2

@pfultz2 it is done

apwojcik avatar May 08 '24 18:05 apwojcik

@causten CodeCov reposts a 0.01% test coverage decrease. However, when I look into the pull request details, it says that the patch is 100% covered. Am I reading it wrong?

apwojcik avatar May 28 '24 13:05 apwojcik