onnxruntime [js/webnn] Enable user-supplied MLContext

Description

This PR enables the API added in #20816 as well as moving context creation to JS.

Motivation and Context

In order to enable I/O Binding with the upcoming MLBuffer API in the WebNN specification, we need to share the same MLContext across multiple sessions. This is because MLBuffers are restricted to the MLContext where they were created. This PR enables developers to use the same MLContext across multiple sessions.

May 07 '24 23:05 egalli

@fs-eire, @guschmue, @fdwr, PTAL, thanks!

May 08 '24 01:05 Honry

The current implementation has a few problems:

If user specify multiple execution providers (which is legit in ORT) like ['webgpu', { name: 'webnn', powerPreference: 'high-performance' }], it is hard to tell from JavaScript code that which EP is currently in use. As long as WebNN is initialized, JavaScript has no idea which EP will actually work with the current model - and by reading from execution providers config is not sufficient to tell.
The current implementation splited the code of initialization process into multiple places. The C++ code does a few things and the Javascript does some others. The [webnn session option] to [context ID] is a 1:1 mapping, and the [session ID] to [context ID] is a multi-to-one mapping. In my understanding, using a singleton map in C++ should be a better way to implement this requirement. This is because it's much easier to read and understand and putting all related code together if possible to reduce the chance to introduce bugs in future changes. Please let me know if I understand this part wrong.
Modifying the user input (session options) is a concern. This is usually not an expected behavior however the current implementation depends on adding a property to a user specified session options.

May 09 '24 21:05 fs-eire

I have moved the context de-duplication to a singleton in C++.

May 10 '24 18:05 egalli

Add a few comments here:

There is a new issue (#20729) reveals a clearer picture of how an actual requirement would be. Users may want to manipulate with the MLContext with more flexibility. I am currently thinking about it may be a good idea to let user to create the MLContext and just pass it to ORT via session options.

Considering the latest spec: https://www.w3.org/TR/webnn/#api-ml-createcontext

There will be a webnn-webgpu interop and createContext() may accept an WebGPU gpuDevice object. This will be even more difficult to implement inside ORT so just let users to do their part.

May 20 '24 21:05 fs-eire

@fs-eire given that the API changes have landed ( https://github.com/microsoft/onnxruntime/pull/20816 ) should I modify this PR to use the new API directly or should I convert the code back to JS?

Jun 03 '24 17:06 egalli

@fs-eire given that the API changes have landed ( #20816 ) should I modify this PR to use the new API directly or should I convert the code back to JS?

Since the API may accept MLContext as user input, we anyway need to pass MLContext from JS to C++. So it may be a good idea to create and manage MLContext in JS.

Jun 04 '24 17:06 fs-eire

@fs-eire I have updated the PR to use the new API, PTAL.

Jun 10 '24 16:06 egalli

@fs-eire, gently ping. :)

Jun 19 '24 05:06 Honry

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

Jun 20 '24 20:06 guschmue

Azure Pipelines successfully started running 3 pipeline(s).

Jun 20 '24 20:06 azure-pipelines[bot]

@guschmue I have fixed the linting issues. That said, I'm concerned about adding // Copyright (c) Microsoft Corporation. All rights reserved. to the webnn type declaration file that will eventually be moved to webmachinelearning/webnn-types. Do you mind rerunning the CI?

Jun 25 '24 21:06 egalli

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

Jul 03 '24 17:07 guschmue

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

Jul 03 '24 17:07 guschmue

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Jul 03 '24 17:07 guschmue

Azure Pipelines successfully started running 3 pipeline(s).

Jul 03 '24 17:07 azure-pipelines[bot]

Azure Pipelines successfully started running 9 pipeline(s).

Jul 03 '24 17:07 azure-pipelines[bot]

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Jul 08 '24 15:07 guschmue

Azure Pipelines successfully started running 7 pipeline(s).

Jul 08 '24 15:07 azure-pipelines[bot]