onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Enablement of onnxruntime for AIX and fixing issues related to big-endian platform.

Open ranjitshs opened this issue 1 year ago • 21 comments

Description

Enablement of onnxruntime for AIX and fixing issues related to big-endian platform.

Motivation and Context

changes in this PR contains:

  1. Enablement code for building onnxruntime on AIX operating system.
  2. while testing the build on AIX, we found issues related to big endian platform . More details about few of those issues can be found in Big endian issue: Graph Transformation Attention Fusion tests are failing #12921

Below are list of files and the description about the change.

  1. cmake/CMakeLists.txt [BUILDING on AIX issue] check for "IBMClang" is added for handling -Wno-unused-parameter
  2. cmake/external/onnxruntime_external_deps.cmake [BUILDING on AIX issue]Enabling gtest_disable_pthreads for AIX
  3. cmake/onnxruntime.cmake [BUILDING on AIX issue] o Blocking codes for AIX which generates generated_source.c and further requires some symbol files. o Putting NO AIX check for non-supported linker flags like --Xlinker o iconv linking
  4. cmake/onnxruntime_framework.cmake [BUILDING on AIX issue]Putting NO AIX check for -Wl,-rpath='$ORIGIN'
  5. cmake/onnxruntime_mlas.cmake [BUILDING on AIX issue]POWER10 releated macro/function definition .
  6. cmake/onnxruntime_providers_cpu.cmake [BUILDING on AIX issue]Putting NO AIX check for non-supported linker flags like --Xlinker
  7. cmake/onnxruntime_unittests.cmake [BUILDING on AIX issue] o Putting NO AIX check for non-supported linker flags like --Xlinker o Adding required libraries for AIX linker under applicatiion like onnxruntime_shared_lib_test ,onnxruntime_logging_apis etc
  8. cmake/patches/flatbuffers/flatbuffers.patch [BUILDING on AIX issue] Handling of TypeCode in include/flatbuffers/flatbuffers.h under AIX + clang
  9. onnxruntime/contrib_ops/cpu/murmur_hash3.cc [Big endian issue] Byte-Conversion handlling in compute() and getblock() routines
  10. onnxruntime/contrib_ops/cpu/quantization/matmul_nbits_impl.cc [Big endian issue] Handling of test failures . Byte swapping for quant_value.
  11. onnxruntime/core/framework/tensorprotoutils.cc [Big endian issue] Implementation of SetRawDataInTensorProto , ConvertRawDataInTensorProto . o SetRawDataInTensorProto : Wrapper for set_raw_data(). Calling ConvertRawDataInTensorProto() in big-endian system o ConvertRawDataInTensorProto : function used mainly on big-endian system for byte-swapping of tensor raw_data
  12. onnxruntime/core/framework/tensorprotoutils.h [Big endian issue] Declaration of SetRawDataInTensorProto, ConvertRawDataInTensorProto
  13. onnxruntime/core/graph/graph.cc [Big endian issue] o Call ConvertRawDataInTensorProto for SPARSE_TENSOR type o Call ConvertRawDataInTensorProto for SaveToOrtFormat
  14. onnxruntime/core/mlas/lib/platform.cpp [BUILDING on AIX issue] POWER10 released enablement for AIX
  15. onnxruntime/core/mlas/lib/power/qgemm_kernel_power10.cpp [BUILDING on AIX issue]Handling of __vector under AIX+clang
  16. onnxruntime/core/mlas/lib/qgemm.h [BUILDING on AIX issue] Adding _AIX flag
  17. onnxruntime/core/mlas/lib/qlmul.cpp [BUILDING on AIX issue] Handling of __vector under AIX+clang
  18. onnxruntime/core/optimizer/attention_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  19. onnxruntime/core/optimizer/compute_optimizer/shared_utils.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  20. onnxruntime/core/optimizer/constant_folding.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  21. onnxruntime/core/optimizer/embed_layer_norm_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  22. onnxruntime/core/optimizer/nchwc_transformer.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  23. onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  24. onnxruntime/core/optimizer/qdq_transformer/qdq_s8_to_u8.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  25. onnxruntime/core/optimizer/qdq_transformer/s8_to_u8.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  26. onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_actions.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  27. onnxruntime/core/optimizer/reshape_fusion.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  28. onnxruntime/core/optimizer/stft_decomposition.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  29. onnxruntime/core/optimizer/transpose_optimization/ort_optimizer_api_impl.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  30. onnxruntime/core/platform/path_lib.h [BUILDING on AIX issue] Moving to normal function call, instead of template
  31. onnxruntime/core/platform/posix/env.cc [BUILDING on AIX issue]Blocking syscall.h in AIX
  32. onnxruntime/core/session/inference_session.cc [Big endian issue] Removing ORT_RETURN_IF_NOT, FLATBUFFERS_LITTLEENDIAN
  33. onnxruntime/test/flatbuffers/flatbuffer_utils_test.cc [Big endian issue] Call ConvertRawDataInTensorProto in CreateInitializer and ExternalWriteReadWithLoadInitializers
  34. onnxruntime/test/framework/sparse_kernels_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  35. onnxruntime/test/framework/tensorutils_test.cc [Big endian issue] Helper method ConvertEndianessForVector and call this from required place.
  36. onnxruntime/test/framework/test_tensor_loader.cc o. [BUILDING on AIX issue] Handling of getcwd for AIX o. [Big endian issue] Bytes Swapping in run_external_data_test
  37. onnxruntime/test/onnx/main.cc [Big endian issue] including for AIX
  38. onnxruntime/test/onnx/tensorprotoutils.cc [Big endian issue] Bytes swapping in UnpackTensorWithRawData
  39. onnxruntime/test/optimizer/graph_transform_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  40. onnxruntime/test/optimizer/graph_transform_test_builder.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  41. onnxruntime/test/optimizer/graph_transform_test_builder.h [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  42. onnxruntime/test/optimizer/initializer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  43. onnxruntime/test/optimizer/nchwc_optimizer_test.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  44. onnxruntime/test/providers/base_tester.cc [Big endian issue] Use util function SetRawDataInTensorProto, instead of set_raw_data
  45. onnxruntime/test/providers/cpu/generator/random_test.cc [BUILDING on AIX issue] Adding AIX check in MultinomialGoodCase

ranjitshs avatar Jun 21 '24 06:06 ranjitshs

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu avatar Jun 22 '24 21:06 tianleiwu

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

tianleiwu avatar Jun 22 '24 21:06 tianleiwu

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

tianleiwu avatar Jun 22 '24 21:06 tianleiwu

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines[bot] avatar Jun 22 '24 21:06 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 22 '24 21:06 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 22 '24 21:06 azure-pipelines[bot]

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu avatar Jun 24 '24 17:06 tianleiwu

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

tianleiwu avatar Jun 24 '24 17:06 tianleiwu

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

tianleiwu avatar Jun 24 '24 17:06 tianleiwu

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines[bot] avatar Jun 24 '24 17:06 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 24 '24 17:06 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 24 '24 17:06 azure-pipelines[bot]

@ranjitshs, you can fix python format by running the following at root:

pip install -r requirements-lintrunner.txt
pip install lintrunner
lintrunner init
lintrunner -a

tianleiwu avatar Jun 24 '24 17:06 tianleiwu

@tianleiwu Thanks for lintrunner suggestion. Please do not initiate workflow run . I just pulled main branch. No change is committed yet.

ranjitshs avatar Jun 25 '24 10:06 ranjitshs

@tianleiwu Could you please initiate workflow execution ?

ranjitshs avatar Jun 26 '24 10:06 ranjitshs

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu avatar Jun 26 '24 23:06 tianleiwu

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 26 '24 23:06 azure-pipelines[bot]

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

tianleiwu avatar Jun 26 '24 23:06 tianleiwu

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jun 26 '24 23:06 azure-pipelines[bot]

@tianleiwu I have fixed conflicts from main branch. Could you please trigger build pipeline?

ranjitshs avatar Jul 01 '24 10:07 ranjitshs

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

tianleiwu avatar Jul 01 '24 17:07 tianleiwu

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

tianleiwu avatar Jul 01 '24 17:07 tianleiwu

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jul 01 '24 17:07 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Jul 01 '24 17:07 azure-pipelines[bot]

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

tianleiwu avatar Jul 02 '24 17:07 tianleiwu

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines[bot] avatar Jul 02 '24 17:07 azure-pipelines[bot]

@ranjitshs, please follow up with the LCA comment: https://github.com/microsoft/onnxruntime/pull/21133#issuecomment-2182065619

tianleiwu avatar Jul 02 '24 22:07 tianleiwu

@ranjitshs, please follow up with the LCA comment: #21133 (comment)

Yes. I am working with my company legal team .

ranjitshs avatar Jul 09 '24 10:07 ranjitshs

/azp run orttraining-amd-gpu-ci-pipeline

tianleiwu avatar Jul 15 '24 17:07 tianleiwu

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jul 15 '24 17:07 azure-pipelines[bot]