onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Improvements to the INT8 GEMM portion of the code for Power

Open ChipKerchner opened this issue 1 year ago • 18 comments

These are changes to improve GEMM portion of the code for Power.

There are 2 main code changes :

  1. Changing a function to a template parameter so that operations that add/sub zero are eliminated at compile time. Plus reuse a vector that has the mask instead of rebuilding each time.
  2. Add processing 16 columns at a time in MlasGemmQuantCopyPackB8x8 - this should reduce potential page faults by a factor of 4 and also be faster.
  3. Unroll MlasQgemmStoreVectorMMA and vectorize other variables.

ChipKerchner avatar May 07 '24 13:05 ChipKerchner

Cc: @chenfucn

yuslepukhin avatar May 08 '24 18:05 yuslepukhin

Questions about the lint warnings. Do all lines have to be less than 120 characters? And for statements that span multiple lines, how are they written?

ChipKerchner avatar May 08 '24 22:05 ChipKerchner

Typically, we employ clangformat. I use a visual cue in the IDE and break them manually, then run the formatter again.

yuslepukhin avatar May 08 '24 22:05 yuslepukhin

Which style should I be using for clang-format? microsoft?

It seems the formatter wants to change a lot of code that I did not alter.

ChipKerchner avatar May 08 '24 22:05 ChipKerchner

Which style should I be using for clang-format? microsoft?

It seems the formatter wants to change a lot of code that I did not alter.

Your editor should pick this up automatically. https://github.com/microsoft/onnxruntime/blob/main/.clang-format

yuslepukhin avatar May 08 '24 23:05 yuslepukhin

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

yufenglee avatar May 09 '24 00:05 yufenglee

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

yufenglee avatar May 09 '24 00:05 yufenglee

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

yufenglee avatar May 09 '24 00:05 yufenglee

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar May 09 '24 00:05 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 09 '24 00:05 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 09 '24 00:05 azure-pipelines[bot]

I use vim as my editor. I'm not sure it will pickup lint formatting.

ChipKerchner avatar May 09 '24 11:05 ChipKerchner

I use vim as my editor. I'm not sure it will pickup lint formatting.

Most of the failures about extra space. Lots of editors show non-visible characters.

yuslepukhin avatar May 09 '24 22:05 yuslepukhin

I'm seeing about a 2.6-4X improvement for PackB

ChipKerchner avatar May 13 '24 14:05 ChipKerchner

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

yufenglee avatar May 13 '24 18:05 yufenglee

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

yufenglee avatar May 13 '24 18:05 yufenglee

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 13 '24 18:05 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 13 '24 18:05 azure-pipelines[bot]

Can we move forward with this PR?

ChipKerchner avatar May 22 '24 13:05 ChipKerchner

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

yuslepukhin avatar May 22 '24 18:05 yuslepukhin

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

yuslepukhin avatar May 22 '24 18:05 yuslepukhin

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 22 '24 18:05 azure-pipelines[bot]

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar May 22 '24 18:05 azure-pipelines[bot]

Can we merge this?

ChipKerchner avatar Jun 04 '24 19:06 ChipKerchner

/azp run orttraining-ortmodule-distributed,

yufenglee avatar Jun 05 '24 20:06 yufenglee

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jun 05 '24 20:06 azure-pipelines[bot]

/azp run Linux Android Emulator QNN CI Pipeline

yufenglee avatar Jun 05 '24 20:06 yufenglee

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Jun 05 '24 20:06 azure-pipelines[bot]

Can we merge this?

Thanks @ChipKerchner !

yufenglee avatar Jun 05 '24 21:06 yufenglee