Dmitri Smirnov comments

Results 141 comments of


                                            Dmitri Smirnov

[java] Sparse tensor support

/azp run MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

[java] Sparse tensor support

> Pre-allocation could work, but that would require a rewrite of all the output processing in both the Java and the native code. I'd missed the update to `Run` which...

[java] Sparse tensor support

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar...

[java] Sparse tensor support

/azp run onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, ONNX Runtime Web CI Pipeline

[java] Sparse tensor support

> Can we get this integrated now the 1.11 release has happened? I can rebase it on master if necessary, then it'll be easier to work on the native binding...

[java] Sparse tensor support

Copying or not, we want to make sure that the output tensors are deallocated when no longer needed. --- In reply to: [1054694679](https://github.com/microsoft/onnxruntime/pull/10653#issuecomment-1054694679) [](http://example.com/codeflow?ancestors=1054694679,1054582742)

[java] Sparse tensor support

> public static final int MAX_DIMENSIONS = 8; We did some profiling, the max dim we ever hit was 5. --- In reply to: [1306361595](https://github.com/microsoft/onnxruntime/pull/10653#issuecomment-1306361595) [](http://example.com/codeflow?ancestors=1306361595) --- Refers to: java/src/main/java/ai/onnxruntime/TensorInfo.java:15...

Refactor DeviceScan implementation to allow InclusiveScan/Sum to take an initial value

I think that captures it. Interesting enough, that when thrust::inclusive_scan was used, the output was correct, however, the switch was made to cub::DeviceScan because of its ability to take cuda...

Refactor DeviceScan implementation to allow InclusiveScan/Sum to take an initial value

I also found [this ](https://forums.developer.nvidia.com/t/how-to-use-thrust-for-each-with-cuda-streams/177797/3)in the forums, implying that in 11.4 the problem may have been addressed.