glTF Add KHR_interactivity draft

Rendered version: https://github.com/KhronosGroup/glTF/blob/interactivity/extensions/2.0/Khronos/KHR_interactivity/Specification.adoc

May 31 '23 23:05 lexaknyazev

Great work! What sections are next? Also what could we release as a draft extension for SIGGRAPH 2023?

Jun 26 '23 17:06 bhouston

I'd like to echo @aaronfranke's comment above regarding the current state of the KHR_interactivity spec. There are some critical issues that need addressing:

Key Concerns

Incomplete / Missing Files:
- More time, example files, and implementations are needed for a spec like this, especially given its ties to other extensions like physics and audio. The quality bar for this spec should be above average for this reason.
Public Comment Process:
- Clarification is needed on the feedback process mentioned in the blog post. Specifically:
  - What is the exact time frame for public comment?
  - When can we expect responses to previous comments left on the PR?
Development:
- Attention should be given to KHR_audio_emitter if it is to be used within KHR_interactivity.
- JSON schemas and example files are missing from the current PR.
- Specify example implementations and supported game engines

Rushing to ratification could be detrimental or harm public perception of the standards process. At the very least can we extend the public review period for thorough testing and validation to ensure KHR_interactivity becomes a robust and well-developed standard? Thanks!

Jun 19 '24 17:06 madjin

But there will be the question of whether there should also be a float3x3, or intN types, or doubleN types, or whether a "value type" of a socket could also be a string, or whether there is any consideration for how to handle arrays.

Matrices other than 4x4 are trivially possible but adding them in advance would further increase required test/sample coverage so we'd like to have specific usecases first. Same with integer vectors.
doubleN is not needed as all floats here always have double precision.
Strings and arrays are not included to avoid dynamic memory allocations in the type system for now.

Section "3.6.3.1.7. Multi Gate":

On a brainstorming level: I wonder whether it could make sense to split this into "Multi Gate" and "Random Multi Gate". Yes, the isRandom flag can be toggled at runtime.

It cannot. Node configuration is static, i.e., it's a compile-time flag.

But even if nothing is changed there: It is not clear (i.e. not specified) whether the order will be the same in each loop pass, or whether it will be randomized for each pass.

It certainly does not have to be the same each time. We'll clarify whether it's explicitly randomized.

Section "3.6.3.1.8. Wait All":

In that context, I wondered whether there should be some sort of "(count down) latch": Something that activates an output if an input was activated N times.

We decided that implementing that with a custom variable and a branch is simple enough to not have a dedicated node. The "Wait All" is warranted because implementing it without arrays is too burdensome.

Section "3.6.3.2.1. Set Delay":

An important, high-level question: How much precision (in terms of timing) is expected from an implementation here?

TBD, subject to implementors feedback.

Section "3.6.4.2.3. Pointer Interpolate"

Too complex for me to think it through right now, but a high-level question: Couldn't this be emulated with a custom graph and just setters?

Having this node allows a graph to easily start multiple interpolations without manually tracking them. Implementing the same logic with a custom graph would be very inconvenient without arrays.

In how far should the specification address things like "compound nodes"?

TBD, likely in the next revision.

Jun 22 '24 14:06 lexaknyazev

Maybe the isRandom could be replaced by a "random seed" (with 0=not random or so).

0 is a valid seed, so using 0 to mean "not random" should not be a thing. If anything should be changed, separating out the random and non-random versions makes the most sense to me.

Section "3.6.3.2.1. Set Delay":

An important, high-level question: How much precision (in terms of timing) is expected from an implementation here?

This would be an unreasonable requirement to impose on implementers, and would produce unexpected results for glTF asset authors. If I delay for 0.5 seconds in a glTF, I would expect this to delay 0.5 seconds from the point in time when the delay block was executed, not 0.5 seconds after the last delay.

As a solution to the problem of a race condition between 1.0 and 0.5 + 0.49999, I would suggest encouraging glTF asset authors to write behavior graphs that are robust to race conditions and will function in either case. Even if race conditions caused by delays were resolved in some convoluted solution, there is also the problem of race conditions in general. Trying to resolve all race conditions at the glTF implementation level is an unsolvable problem on the level of the halting problem; there are an infinite amount of cases to consider on an infinite amount of systems.

In how far should the specification address things like "compound nodes"?

TBD, likely in the next revision.

Being able to build "functions" out of other nodes seems like a highly useful feature, with wide-reaching implications for how assets and implementations interact with each other (such as if one glTF provides a "library" for another to use, probably with both glTF files as part of a unified experience in a glXF file). It may be important enough to have in v1.

In a way, this would be similar to how glTF itself defines in its base specification how it can be extended using "extensions" as an explicitly allowed part of the schema. The big difference being that the implementation of the behavior graph "extension" could be itself provided by another behavior graph.

Yet it may also be highly useful to define a function at the glTF implementation level, to allow implementers to expose new functionality for the behavior graph to use. I did a search of the KHR_interactivity document for "exten" and it seems that only extending the type system is defined in "4.2. Types", but KHR_interactivity does not define any other recommended ways to extend KHR_interactivity. I could imagine a glTF extension defining its own mynamespace/myblock behavior graph node, but it may be better if KHR_interactivity recommends a pattern for this, such as ext/KHR_something/some_func mirroring glTF's own "extensions".

Jun 23 '24 13:06 aaronfranke

Matrices other than 4x4 are trivially possible but adding them in advance would further increase required test/sample coverage so we'd like to have specific usecases first. Same with integer vectors.

Looking beneath the surface, the question refers to what I already brought up early in the process, when the discussion about KHR_interactivity started. Namely, how powerful, generic, and extensible the type system should be. Right now, we have float4x4. We could add float3x3, float2x2, float4x3, and later maybe int2, int3, int4 (and find some relief in the fact that this still revolves around JSON, and the necessity to differentiate between (at least) eight different types (u)int(8/16/32/64) is ruled out from the beginning). The deeper question is: At which point ~~would~~ will the effort of adding a new type become prohibitively large? Depending on how many types will have to be added, there will inevietably be a point where the initial difficulties that come with trying to define a type system that includes things like matrix<rows, cols, componentType> will be outweighed by the effort of adding a new type in a system where this triggers a combinatorial explosion.

But to emphasize that: I'm NOT advocating for generalizing the type system to include such "parameterized types" (The effort is so high that even most "real" programming languages don't get this right...). I just want to make sure that people are aware of the possible limitations of the current approach in terms of evolvability.

I wonder whether it could make sense to split this into "Multi Gate" and "Random Multi Gate". Yes, the isRandom flag can be toggled at runtime.

It cannot. Node configuration is static, i.e., it's a compile-time flag.

That sounds like it could be beneficial to make the distinction between "random" and "not random" explicit via different nodes. At least, I could imagine that the implementation for both of them could be easier (and maybe more elegant and efficient) than trying to squeeze that "(optional) randomness" into this single node. (But of course, all that is a gut feeling for now - allocating enough time to actually try and implement all that may be a challenge...)

@aaronfranke

0 is a valid seed, so using 0 to mean "not random" should not be a thing.

I expected that :-) And "giving a special meaning to a certain value" is something that I usually like to avoid. It was just an attempt to describe some sort of "middle ground"/"compromise" for the two main points of that comment:

The "optional randomness" may warrant a new node type (and you seem to not be opposed to that in general)
Whenever there is anything "random", there should definitely be a "random seed".

I know, there is no built-in functionality for the latter in JavaScript. But this is one point where the quirks and shortcomings of this "language" should not leak through. Proper testing and debugging with a seedless 'random' is a pain in the back (or even impossible in many cases).

Trying to resolve all race conditions at the glTF implementation level is an unsolvable problem on the level of the halting problem

Yes! And that's the main (broader) point of that (suggestive) example of 1.0-vs- 0.5 + 0.49999. Another way of phrasing it (similarly suggestive, and completely made-up "pseudocode") would be a test case for an execution engine:

var value = 0;
setDelay(1.0).decrement(value);                   // Happens last
setDelay(0.5).setDelay(0.49998).increment(value); // Happens first
setDelay(0.9999).assert(value == 1);              // Happens in-between

Again: I'm not proposing (or advocating for) anything. I just wanted to bring up that question, and if the answer is ~"there are no specific timing guarantees", then this could be fine, but might have to be elaborated further in the specs. Otherwise, someone could build a graph that resembles the pseudocode above (even unintentionally, due to assumptions that are made about the exact execution order), and may receive some erroneous result (like "divide by zero" or whatnot).

Being able to build "functions" out of other nodes seems like a highly useful feature, with wide-reaching implications for how assets and implementations interact with each other (such as if one glTF provides a "library" for another to use, probably with both glTF files as part of a unified experience in a glXF file). It may be important enough to have in v1.

It would certainly be useful. And it could be "important" (to keep the graphs and their complexity manageable in the long run). But as you said: It opens a whole can of worms, and should be thought through very carefully. I do have some of the related questions on the radar, but will abstain from further comments until it is supposed to be addressed explicitly.

The meta-options of

"adding it in v1" vs. "adding it later in a v2" (implying a "breaking change"?!?)
omitting it in v1, but adding some KHR_interactivity_compound_node_library extension later

will probably be part of the considerations here.

Jun 23 '24 13:06 javagl

Meant to post these notes sooner, got some notes from X spaces that we hosted couple weeks ago with @bhouston and other Khronos contributors + OMI group:

X Space 6-20-24

https://x.com/open_metaverse/status/1803902289742303743

• The interactivity spec originated from Adobe's work on trigger-action lists for Adobe Aero and USDZ. • There was debate between trigger-action lists and behavior graphs approaches, with behavior graphs ultimately chosen. • The spec development process involved studying existing systems like Unreal Blueprints and Unity Visual Scripting. • The goal was to create a "boring" standard that consolidates best practices rather than innovating. • The spec introduces concepts like the glTF object model, events, and variables that can be built upon by future extensions. • It's designed with a layered approach, allowing different levels of capability and security. • There are concerns about the spec being incomplete, lacking examples, JSON schemas, and other expected components. :star: • The current implementation is limited in terms of data types and capabilities compared to full programming languages. • There's discussion about potential future work, including adding more complex data types and operations. • Security considerations include limiting node execution time and restricting memory allocation. • There's debate about whether the spec could support compilation from text-based languages to behavior graphs. • The spec is seen as a foundation for future developments, potentially including WASM integration. • There are questions about how the spec will be implemented in various engines and viewers. • The community is encouraged to try the reference implementation and provide feedback. • There are plans for more discussions and spaces to gather community input before ratification. • The spec is part of a larger ecosystem of glTF extensions, including physics, audio, and procedural materials.

X Space 6-21-24

https://x.com/glTF3D/status/1803548289134109034

• The genesis of the interactivity spec came from Adobe's work on trigger action lists for USDZ and Adobe Aero. • There was debate between trigger-action lists, behavior graphs, and WASM approaches, with behavior graphs ultimately chosen. • Security considerations were a major factor in the design, leading to a more constrained system than arbitrary JavaScript. • The spec builds on the glTF object model and the Animation Pointer extension. • Custom events allow GLTFs to send and receive messages, potentially enabling communication between nested GLTFs. • The spec is designed to be flexible but with performance considerations in mind. • Implementations are expected to limit execution time to maintain performance. • There are ongoing efforts to finalize related extensions like audio and physics. • Google is working on implementing the spec for use in Google Maps and other products. • There are concerns about the timeline feeling rushed and the need for more example assets and supporting materials. • Godot devs expressed interest in the spec but also raised concerns about other needed features, like consistent UUIDs for nodes across exports. • There was discussion about the challenge of maintaining unique identifiers for nodes when optimizing or merging assets. • The Blender team is exploring how to integrate the spec with their geometry nodes system. • There are some concerns about the expansion of glTF's scope and potential performance implications. • The importance of having multiple implementations before ratification was emphasized. • The community was encouraged to contribute to the implementation efforts, particularly in projects like three.js. • The spec is not intended to replace game engines but to enable interactivity for simpler use cases.

Jul 03 '24 16:07 madjin

I apologize if this has already been mentioned somewhere. I am currently checking the Interactivity Extension Specification. How are p1 and p2 parameters used in node pointer/interpolate?

Aug 21 '24 00:08 Hackn0214

A couple of practical questions based on my experiments:

For looping an animation, is the expected approach to set endTime to positive infinity?
For preventing animations overlap when multiple animations target the same node, is the expected approach to just stop all animations that might be playing, or to add an internal state of which animation was last played and then stop that?
~~For interpolating rotations, how would the most-used quaternion interpolation – slerp – be implemented? I'm not sure a bezier curve (p1/p2) can be mapped to a spherical interpolation.~~ Answer: quaternion interpolation is specified as using slerp already. The bezier curve is for the easing function for the t parameter going into the interpolation.
~~Can the same output flow socket be connected to multiple input flow sockets, or must I add a flow/sequence node inbetween? For me that's not clear from the spec. I would expect that similar to values, I can just connect those outputs to multiple inputs and they're all triggered in the order they appear in the JSON.~~ Answer: one output flow socket can only be connected to one input flow socket.
I find flow/sequence to be confusingly named. It's not clear to me if "all output flows are activated one by one" means that each flow output first needs to finish before the next one is invoked, or that they are all fired immediately and just the order is specified here. For reference, other flow graph implementations that I'm aware of use sequence for "sequential" execution with waiting, and "parallel" for in-order parallel execution of the connected flows.

Aug 26 '24 21:08 hybridherbst

For interpolating rotations, how would the most-used quaternion interpolation – slerp – be implemented? I'm not sure a bezier curve (p1/p2) can be mapped to a spherical interpolation.

My understanding is that the control points specified by an interpolation node are to be used as the control points of a Bézier spline that is used as an easing function, à la cubic Bézier easing functions in CSS and elsewhere.

If that's correct, then the implementation would pass the normalized time parameter (on [0, 1]) to the easing function to get the eased progress fraction, then use that as the "time" parameter of the lower-level interpolation routine (whether lerp or slerp).

Aug 26 '24 21:08 warrenm

Thanks! I found it now. The spec states that it's linear interpolation but also has this remark:

If the Object Model property is a quaternion, spherical linear interpolation expression SHOULD be used.

Aug 26 '24 21:08 hybridherbst

Thank you for your response. These parameters seem a bit challenging for designers to handle.

Aug 27 '24 11:08 Hackn0214

Regarding the question I asked earlier, it seems it might have gotten buried and become hard to notice, so I’d like to ask again. I couldn't find node/OnSelect(event/OnSelect) in the specification, but is it possible to handle events when an object is selected?

Aug 27 '24 11:08 Hackn0214

I couldn't find node/OnSelect(event/OnSelect) in the specification, but is it possible to handle events when an object is selected?

This event is defined in the KHR_node_selectability extension, see #2422.

Aug 27 '24 11:08 lexaknyazev

I couldn't find node/OnSelect(event/OnSelect) in the specification, but is it possible to handle events when an object is selected?

This event is defined in the KHR_node_selectability extension, see #2422.

Oh, I thought this extension was an attribute to specify whether an object can be selected. I’ll read through the draft. Thanks!

Aug 27 '24 11:08 Hackn0214

How would a connected graph with multiple output value sockets, as shown in the diagram, be represented in glTF? gltfnode

Aug 29 '24 13:08 Hackn0214

It's not a self-solution, but is there any issue with this kind of definition?

  "nodes": [
    {
      "type": "math/sign",
      "values": [
        {
          "id": "a",
          "node": 2,
          "socket": "1"
        }
      ]
    },
    {
      "type": "math/sign",
      "values": [
        {
          "id": "a",
          "node": 2,
          "socket": "0"
        }
      ]
    },
    {
      "type": "math/extract2",
      "values": [
        {
          "id": "a",
          "type": 1,
          "value": [
            0.0,
            0.0
          ]
        }
      ]
    }
  ],

Aug 30 '24 12:08 Hackn0214

I believe this matches the output of https://github.com/KhronosGroup/glTF-InteractivityGraph-AuthoringTool/tree/initial-work-merge (minus the missing type declarations in your partial file).

Aug 30 '24 15:08 hybridherbst

Thank you. I'm comparing the behavior of my custom tool with the authoring tool as a reference. By the way, is there a way to delete a node once it's placed on the graph in this tool? Currently, I'm restarting the tool to clear the graph.

Aug 30 '24 15:08 Hackn0214

@Hackn0214 backspace seems to remove just fine?

Aug 30 '24 16:08 boguscoder

@Hackn0214 backspace seems to remove just fine? Ah... I was pressing the Delete key.

Aug 30 '24 23:08 Hackn0214

When defining a custom event as shown below, if event/receive and event/send nodes refer to this custom event.

 "events": [
    {
      "id": "MyEvent",
      "values": [
        {
          "id": "Val_1",
          "type": 0
        },
        {
          "id": "Val_2",
          "type": 0
        }
      ]
    }
  ],

I think the generated node will have sockets as shown below, but what do you think? gltfnode

Sep 01 '24 08:09 Hackn0214

Is it possible to return the value pointed to by a pointer with the behavior graph? For example, I would like to change the base color of a selected object's material.

/nodes/{nodeIdx}/mesh -> meshIdx /meshes/{meshIdx}/primitibes/0/material -> materialIdx /materials/{materialIdx}/pbrMetallicRoughness/baseColorFactor

Sep 16 '24 14:09 Hackn0214

If configuration value is array of length 2,3,4 or 16 it could be ambiguous if its vec2/vec3/vec4/vec4x4 or int[], this might not be a problem for JS but is a problem for strongly-typed implementations, is there any recommendation for them how to disambiguate?

Sep 24 '24 23:09 boguscoder

For example, how about specifying the type at the time of the call? In fact, I would also like to have a feature that returns the number of elements as shown below. /materials -> total material number in the scecne /meshes/0/primitives -> Total primitive number

Sep 25 '24 01:09 Hackn0214

@Hackn0214 you already can get number of materials when using appropriate pointer , check out core pointers

Sep 25 '24 01:09 boguscoder

That's wonderful! Thank you!

Sep 25 '24 01:09 Hackn0214

I would appreciate it if you could add /meshes/{}/primitives.length

Sep 25 '24 05:09 Hackn0214

Is there a (reasonably up-to-date) machine-processable representation of this? (I.e. something like a schema or a "repository" of node structure descriptions)?

Jan 21 '25 12:01 javagl

Is there a (reasonably up-to-date) machine-processable representation of this? (I.e. something like a schema or a "repository" of node structure descriptions)?

Not yet, this document is the only normative source for now. We'll publish regular JSON schemas after confirming that the current early implementations are aligned with them.

That said, JSON schemas alone are not enough since most operations have their own predefined socket types and ids. Suggestions on the machine-readable node spec representations are welcome.

Jan 21 '25 12:01 lexaknyazev

It could be something simple as a starter....

output-2025-01-21.json

One could/should probably group this based on the Section structure, maybe even include the headers like ===== Cross Product or so. One could also think about parsing and understanding the types...

Jan 21 '25 15:01 javagl