onnxruntime [ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops

Description: Add NGramRepeatBlock, LongformerAttention and DecoderAttention Op to ROCm and/or enable their tests

Motivation and Context

Why is this change required? What problem does it solve?
- Add and test NGramRepeatBlock, LongformerAttention and DecoderAttention Ops.

Jun 23 '22 22:06 xinyazhang

Pull request contains merge conflicts.

Jul 15 '22 03:07 azure-pipelines[bot]

Pull request contains merge conflicts.

Jul 15 '22 03:07 azure-pipelines[bot]

can u pls resolve conflicts first? thx

Jul 15 '22 17:07 ytaous

can u pls resolve conflicts first? thx

Updated @ytaous

Jul 15 '22 23:07 xinyazhang

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Jul 16 '22 04:07 ytaous

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Jul 16 '22 04:07 ytaous

Azure Pipelines successfully started running 9 pipeline(s).

Jul 16 '22 04:07 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

Jul 16 '22 04:07 azure-pipelines[bot]

pls fix lint error

Jul 18 '22 19:07 ytaous

pls fix lint error

A few remaining lint problems:

s = s.replace(...) in python, which is common practice.
long long int usage in onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h, which is the practice of cuda version.

Jul 18 '22 22:07 xinyazhang

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Jul 18 '22 23:07 ytaous

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Jul 18 '22 23:07 ytaous

Azure Pipelines successfully started running 8 pipeline(s).

Jul 18 '22 23:07 azure-pipelines[bot]

Azure Pipelines successfully started running 9 pipeline(s).

Jul 18 '22 23:07 azure-pipelines[bot]

I think the C++ lint error is a false alert. Those lines do not exceed the 120-character limit The Python is the conventional s = s.replace(...) practice.

Jul 19 '22 08:07 xinyazhang

Hi any update on this? No hurry, we can try to close the others before this one, thx.

Aug 02 '22 21:08 ytaous

Hi any update on this? No hurry, we can try to close the others before this one, thx.

Working on PR #11968 and #11972 now. Will go back to this one after pushing necessary changes.

Aug 02 '22 22:08 xinyazhang

$ (cd ./branch_build/$(git branch --show-current)/RelWithDebInfo/; ./onnxruntime_test_all --gtest_filter='NGramRepeatBlockTest.*:LongformerAttentionTest.*:DecoderAttentionTest.*:')
...
[----------] Global test environment tear-down
[==========] 13 tests from 3 test suites ran. (4578 ms total)
[  PASSED  ] 13 tests.

GPU Utilization confirmed with AMD_LOG_LEVEL=3.

Full log files: pool_v2.log pool_v2_gpu.log

Aug 03 '22 01:08 xinyazhang

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Aug 03 '22 01:08 ytaous

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Aug 03 '22 01:08 ytaous

Azure Pipelines successfully started running 9 pipeline(s).

Aug 03 '22 01:08 azure-pipelines[bot]

Azure Pipelines successfully started running 8 pipeline(s).

Aug 03 '22 01:08 azure-pipelines[bot]

fyi - https://github.com/microsoft/onnxruntime/pull/12435

Aug 03 '22 06:08 ytaous

fyi - #12435

it's merged, can u please update it as needed? thx

Aug 03 '22 18:08 ytaous

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline

Aug 03 '22 23:08 ytaous

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline

Aug 03 '22 23:08 ytaous

Azure Pipelines successfully started running 8 pipeline(s).

Aug 03 '22 23:08 azure-pipelines[bot]

Azure Pipelines successfully started running 9 pipeline(s).

Aug 03 '22 23:08 azure-pipelines[bot]

heads up - https://github.com/microsoft/onnxruntime/pull/12448

Aug 03 '22 23:08 ytaous

@iK1D - pls also take a look, thx

Aug 08 '22 20:08 ytaous

onnxruntime onnxruntime copied to clipboard

[ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops

onnxruntime
onnxruntime copied to clipboard