onnxruntime
onnxruntime copied to clipboard
Fix include order build failure training build
This fixes build fatal error c1083: Cannot open include file 'safeint/SafeInt.hpp'
This build error showed up when building the oneDNN execution provider with --enable_training
Using git bisect
the issue was issolated to 75bda9f267382e59393250344248438e82a4ad15
Author: pengwa [email protected] CPU AdamW implementation (#11978)
After verifying the build without --enable_training was not failing the include paths were investigated.
I discovered that core/providers/cpu/tensor/utils.h
has different includes depending
if the macro SHARED_PROVIDER
is defined. The only place the macro SHARED_PROVIDER
is defined is inside core/providers/shared_library/provider_api.h
this lead me to
conclude that the tensor/utils.h
was being included by the build before provider_api.h
I forced a build with SHARED_PROVIDER
defined and verified the include file error did
not occur if SHARED_PROVIDER
was defined at the start of the build.
The include order build failure was resolved by moving the #include for the tensor/utils.h
from adamwbase.h into adamw.h. This made it so provider_api.h
was seen before
tensor/utils.h
resulting in the SHARED_PROVIDER being defined.
Signed-off-by: George Nash [email protected]
Description: Describe your changes.
Motivation and Context
- Why is this change required? What problem does it solve? This bug was preventing building the oneDNN ep with training enabled.
- If it fixes an open issue, please link to the issue here.
@pengwa I thought I would ping you since this issue was seen as a result of your commit. This is a really small change but I would like you to verify if the change causes you any problems.
@jywu-msft could you run /azp run Linux DNNL CI Pipeline
/azp run Linux DNNL CI Pipeline
Azure Pipelines successfully started running 1 pipeline(s).
@pengwa do you have a moment to comment on this change? Asking you since this issue occurred as a result of a CPU AdamW implementation commit.
/azp run On-Device Training Tests
Azure Pipelines successfully started running 1 pipeline(s).
@pengwa do you have a moment to comment on this change? Asking you since this issue occurred as a result of a CPU AdamW implementation commit.
sorry I somehow missed the messaged. The fix LGTM.
@jywu-msft or @baijumeswani could you launch the other CI builds?
Thanks
@jywu-msft or @baijumeswani could you kick off the other CI builds?
Thanks
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed
Azure Pipelines successfully started running 10 pipeline(s).
Azure Pipelines successfully started running 6 pipeline(s).