onnxruntime [java] Sparse tensor support

Description:

Adds support for creating and receiving sparse tensors in the ORT Java API.

CSRC and COO tensors as inputs are tested, but there is no op which accepts a block sparse tensor to test. COO tensors are tested as outputs, but there is no op which emits a CSRC or block sparse tensor to test.

Motivation and Context

Why is this change required? What problem does it solve? Request to expose ORT sparse tensor support in Java.

cc @yuslepukhin

Feb 24 '22 03:02 Craigacp

One question. Does this code still copy memory for input/output? We should not copy at least for input and if so I believe there is a way to prevent that on output as well.

Feb 28 '22 18:02 yuslepukhin

It does not copy on input if the buffers are direct. If they aren't direct then they need to be copied into direct ones, which are passed in to the OrtValue. Then the OnnxSparseTensor holds a reference to the buffers so they don't get garbage collected.

For outputs we could wrap the bare pointers in direct byte buffers, but if those buffers are exposed out of the OnnxSparseTensor then they need to not live past the lifetime of the sparse tensor, and that's tricky to enforce. Currently they copy the direct buffer into another buffer before wrapping the copies in the COOTensor/CSRCTensor/BlockSparseTensor.

Feb 28 '22 19:02 Craigacp

Well, we are doing just find in C#, holding to the native pointers received from ORT and destroy then when IDisposable is invoked, similar to AutoClosable. If that does not work, here is idea. If output shapes are known, pre-allocate output buffers, create OrtValues and pass them as outputs with corresponding name mappings. You still need to hold on to OrtValues pointers and destroy them on AutoClose.

In reply to: 1054582742

Feb 28 '22 21:02 yuslepukhin

The direct byte buffer abstraction in Java is different from the native pointer abstraction in C#, and I think that there is not a good solution here for outputs that has a zero copy output path and that also exposes the buffers to users. We could alternatively expose direct access into the OrtValue by implementing chunks of the buffer API on top of OnnxSparseTensor (or in private lifetime limited objects inside OnnxSparseTensor), but that's a substantial development effort. If you want to use a sparse tensor output from one ORT session and pass it into another ORT session then that's zero copy, it's only if you want to get the values back out into Java that it triggers a copy. When the Java foreign memory API is completed (and ORT is updated to use that minimum Java version, which might involve forking the Java API from the Android one) then there will be better options here for managing the lifetimes of these chunks of memory.

Pre-allocation could work, but that would require a rewrite of all the output processing in both the Java and the native code. I'd missed the update to Run which switched the output pointer over so it was _Inout_, that seems to have changed after the Java API was initially integrated. Presumably it throws an exception if one of the supplied OrtValues is the wrong size or shape?

In reply to: 1054714133

Feb 28 '22 22:02 Craigacp

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline

Feb 28 '22 23:02 yuslepukhin

/azp run MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

Feb 28 '22 23:02 yuslepukhin

Azure Pipelines successfully started running 7 pipeline(s).

Feb 28 '22 23:02 azure-pipelines[bot]

Azure Pipelines successfully started running 7 pipeline(s).

Feb 28 '22 23:02 azure-pipelines[bot]

I fixed the build error. Clang must do different exhaustiveness checking on enums than gcc or msvc as the latter two complained I wasn't assigning a value in a switch which covered all the enum cases. I added a default branch which lines up with ORT_SPARSE_UNDEFINED which should make it happy. Plus I rebased after the merge of #10670.

Mar 01 '22 15:03 Craigacp

Pre-allocation could work, but that would require a rewrite of all the output processing in both the Java and the native code. I'd missed the update to Run which switched the output pointer over so it was _Inout_, that seems to have changed after the Java API was initially integrated. Presumably it throws an exception if one of the supplied OrtValues is the wrong size or shape?

Yes, it throws and it does it soon. The API docs and the parameter annotations still need work. This feature was available for a long time. Let's fix error reporting first and then we can work on copy elimination. Both are very important. Not so much right now for SparseTensors, but for dense Tensors in the first place.

Mar 01 '22 18:03 yuslepukhin

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline

Mar 01 '22 18:03 yuslepukhin

/azp run MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

Mar 01 '22 18:03 yuslepukhin

Azure Pipelines successfully started running 7 pipeline(s).

Mar 01 '22 18:03 azure-pipelines[bot]

Azure Pipelines successfully started running 7 pipeline(s).

Mar 01 '22 18:03 azure-pipelines[bot]

/azp run onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, ONNX Runtime Web CI Pipeline

Mar 02 '22 19:03 yuslepukhin

Azure Pipelines successfully started running 3 pipeline(s).

Mar 02 '22 19:03 azure-pipelines[bot]

Can we get this integrated now the 1.11 release has happened? I can rebase it on master if necessary, then it'll be easier to work on the native binding improvements that we discussed.

Apr 19 '22 19:04 Craigacp

Can we get this integrated now the 1.11 release has happened? I can rebase it on master if necessary, then it'll be easier to work on the native binding improvements that we discussed.

I do not mind we rebase it and merge it.

What is your plan with regards to error reporting work?

Apr 20 '22 18:04 yuslepukhin

Can we get this integrated now the 1.11 release has happened? I can rebase it on master if necessary, then it'll be easier to work on the native binding improvements that we discussed.

I do not mind we rebase it and merge it.

What is your plan with regards to error reporting work?

Once this has been merged in I'll start making smaller PRs for the different JNI files so it's not as unpleasant to review. The initial one will be bigger because it'll have the error handling changes in OrtJniUtil.c and things called from the session run method. All the rest should be smaller and simpler.

Apr 20 '22 21:04 Craigacp

Is it possible to get this branch merged in? I've fixed the merge conflict, and the JNI rewrite I've been doing started from this branch.

Jun 15 '22 02:06 Craigacp

The linter seems to be treating my C code as C++ code, as it's complaining I'm using C style casts. Aside from that issue, is there a code formatter configuration I can get to run over the JNI code? I can use IDE plugins for VS Code or CLion, but I don't have access to Visual Studio.

Jun 15 '22 03:06 Craigacp

I've started work on updating this branch with the new error handling code, but it's not ready for review yet.

Sep 13 '22 02:09 Craigacp

This is ready for review now, I've finished updating it with the new JNI error handling.

Sep 22 '22 02:09 Craigacp

This should be ready for review again.

Oct 04 '22 17:10 Craigacp

@yuslepukhin would you mind reviewing this again?

Nov 07 '22 20:11 Craigacp

Copying or not, we want to make sure that the output tensors are deallocated when no longer needed.

In reply to: 1054694679

Nov 07 '22 23:11 yuslepukhin

public static final int MAX_DIMENSIONS = 8;

We did some profiling, the max dim we ever hit was 5.

In reply to: 1306361595

Refers to: java/src/main/java/ai/onnxruntime/TensorInfo.java:15 in 339f78f. [](commit_id = 339f78f59b38d0055cb07b1cd8e005fe19898019, deletion_comment = False)

Nov 07 '22 23:11 yuslepukhin

Copying or not, we want to make sure that the output tensors are deallocated when no longer needed.

In reply to: 1054694679

OnnxSparseTensor is AutoCloseable, and deallocates the native memory on close. The SparseTensor class is all Java side and the ByteBuffers it contains are managed by the garbage collector.

Nov 08 '22 01:11 Craigacp

/azp run MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-python-checks-ci-pipeline

Nov 15 '22 18:11 yuslepukhin

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline

Nov 15 '22 18:11 yuslepukhin