onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Fix include order build failure training build

Open georgen117 opened this issue 2 years ago • 1 comments

This fixes build fatal error c1083: Cannot open include file 'safeint/SafeInt.hpp'

This build error showed up when building the oneDNN execution provider with --enable_training

Using git bisect the issue was issolated to 75bda9f267382e59393250344248438e82a4ad15 Author: pengwa [email protected] CPU AdamW implementation (#11978)

After verifying the build without --enable_training was not failing the include paths were investigated.

I discovered that core/providers/cpu/tensor/utils.h has different includes depending if the macro SHARED_PROVIDER is defined. The only place the macro SHARED_PROVIDER is defined is inside core/providers/shared_library/provider_api.h this lead me to conclude that the tensor/utils.h was being included by the build before provider_api.h

I forced a build with SHARED_PROVIDER defined and verified the include file error did not occur if SHARED_PROVIDER was defined at the start of the build.

The include order build failure was resolved by moving the #include for the tensor/utils.h from adamwbase.h into adamw.h. This made it so provider_api.h was seen before tensor/utils.h resulting in the SHARED_PROVIDER being defined.

Signed-off-by: George Nash [email protected]

Description: Describe your changes.

Motivation and Context

  • Why is this change required? What problem does it solve? This bug was preventing building the oneDNN ep with training enabled.
  • If it fixes an open issue, please link to the issue here.

georgen117 avatar Aug 02 '22 20:08 georgen117

@pengwa I thought I would ping you since this issue was seen as a result of your commit. This is a really small change but I would like you to verify if the change causes you any problems.

georgen117 avatar Aug 02 '22 20:08 georgen117

@jywu-msft could you run /azp run Linux DNNL CI Pipeline

georgen117 avatar Aug 19 '22 18:08 georgen117

/azp run Linux DNNL CI Pipeline

jywu-msft avatar Aug 19 '22 19:08 jywu-msft

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Aug 19 '22 19:08 azure-pipelines[bot]

@pengwa do you have a moment to comment on this change? Asking you since this issue occurred as a result of a CPU AdamW implementation commit.

georgen117 avatar Aug 19 '22 20:08 georgen117

/azp run On-Device Training Tests

baijumeswani avatar Aug 19 '22 20:08 baijumeswani

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Aug 19 '22 20:08 azure-pipelines[bot]

@pengwa do you have a moment to comment on this change? Asking you since this issue occurred as a result of a CPU AdamW implementation commit.

sorry I somehow missed the messaged. The fix LGTM.

pengwa avatar Aug 23 '22 05:08 pengwa

@jywu-msft or @baijumeswani could you launch the other CI builds?

Thanks

georgen117 avatar Aug 29 '22 21:08 georgen117

@jywu-msft or @baijumeswani could you kick off the other CI builds?

Thanks

georgen117 avatar Sep 01 '22 00:09 georgen117

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline

baijumeswani avatar Sep 01 '22 00:09 baijumeswani

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

baijumeswani avatar Sep 01 '22 00:09 baijumeswani

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Sep 01 '22 00:09 azure-pipelines[bot]

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines[bot] avatar Sep 01 '22 00:09 azure-pipelines[bot]