glTF-Transform icon indicating copy to clipboard operation
glTF-Transform copied to clipboard

feat(functions): Add hashProperty(...) function

Open donmccurdy opened this issue 2 months ago • 8 comments

hashProperty(prop) returns a hash computed from all attributes and references held by this property, excluding the 'skip' set. Hash collisions are rare, but possible, so hashes alone should not be used to check for equality, but can be combined with Property#equals.

See also: https://en.wikipedia.org/wiki/Merkle_tree

Related:

  • https://github.com/donmccurdy/glTF-Transform/issues/1722
  • https://github.com/donmccurdy/glTF-Transform/issues/1708
  • https://github.com/donmccurdy/glTF-Transform/issues/1527
  • https://github.com/donmccurdy/glTF-Transform/issues/881

PR Dependency Tree

  • PR #1757 👈
    • PR #1760
      • PR #1761

This tree was auto-generated by Charcoal

donmccurdy avatar Nov 08 '25 03:11 donmccurdy

Would a "shallow hash" also be added here? A "shallow hash" hashes references instead of values for things like RefSet, RefMap. It can be implemented using a WeakMap, or add a counter when a Property constructed, can reduce a lot of compute when deep hash of values not needed.

WeakMap impl

  // WeakMap is the key to being GC-friendly. It won't prevent garbage
  // collection of objects that are no longer referenced elsewhere.
  const cache = new WeakMap();
  // A simple counter to generate unique IDs for new objects.
  let nextId = 0;
  /**
   * Hashes a value.
   * @param {object} value The value to hash.
   * @returns {number} The hash value.
   */
  function shallowReferenceHash(value) {
    // Check if we have already hashed this exact object reference.
    if (cache.has(value)) {
      return cache.get(value);
    }
    // If it's a new object reference, generate a new ID, store it, and return it.
    const hash = nextId++;
    cache.set(value, hash);
    return hash;
  };

kzhsw avatar Nov 08 '25 06:11 kzhsw

Hm, I can't think of a reason that a shallow hash would be needed right now. For the use cases linked above, it would risk not matching deep duplicates.

donmccurdy avatar Nov 08 '25 13:11 donmccurdy

Hm, I can't think of a reason that a shallow hash would be needed right now. For the use cases linked above, it would risk not matching deep duplicates.

Taking animation samplers as an example, if running it after accessor dedup, then there is no need to do a deep hash of underlying arrays, just hash/compare the reference of accessor.

kzhsw avatar Nov 10 '25 00:11 kzhsw

For that case, I believe we can reuse the same cache argument in sampler.toHash(skip, cache), so when comparing animation samplers we'll already have cached hashes for the accessors precomputed.

donmccurdy avatar Nov 10 '25 02:11 donmccurdy

For that case, I believe we can reuse the same cache argument in sampler.toHash(skip, cache), so when comparing animation samplers we'll already have cached hashes for the accessors precomputed.

Yeah, the idea of a cache is great, but how to manage life cycles of the cache? Like when calling property.swap or property.set, would the cache be reset? What will happen if the underlying array updated?

kzhsw avatar Nov 10 '25 07:11 kzhsw

Calling toHash() will use a new/clean cache unless a user-created cache is passed in. So by default users will get conservative results (no risk of outdated cache) with potentially duplicated work across multiple toHash calls. I don't think the library needs to automatically reuse a cache, track changes, and invalidate the cache as needed. This would be "possible", but I expect very few users to call toHash directly, so adding document-wide change tracking would add performance overhead for little benefit.

For the purposes of dedup(), I'm hoping that a single cache can be created and used throughout the dedup transform step, i.e. nothing that happens within dedup should invalidate the cache. But we'll have to check that this is true.

As to when the cache "should" be invalidated for users or code managing it explicitly: any time a deep-equality check would have changed. Swapping out an accessor for another deeply-equal accessor does not change the hash and wouldn't require a cache reset, but swapping for an accessor with different values would. I suppose it would be possible to prefill the cache with hash values based on some other, custom computation, if one chose to.

donmccurdy avatar Nov 10 '25 14:11 donmccurdy

Might need that shallow hash after all! Here's a case where toHash is currently failing:

test('toHash - circular reference', async (t) => {
	// Create a circular reference, nodeA -> nodeB -> skin -> nodeA.
	const document = new Document();
	const meshNode = document.createNode('Mesh');
	const skeletonNode = document.createNode('Skeleton').addChild(meshNode);
	const skin = document.createSkin().addJoint(skeletonNode).setSkeleton(skeletonNode);
	meshNode.setMesh(document.createMesh()).setSkin(skin);

	t.is(typeof meshNode.toHash(), 'number', 'computes hash');
});

// -> Maximum call stack size exceeded

That's tricky, because circular references are rare but possible; the only example I know of is skinning. Ideas:

  • (a) make toHash shallow only, don't follow property references
  • (b) special-case the toHash calculation in the Skin class
  • (c) ...figure out a general-purpose solution for hashing cyclical graph nodes?
    • https://cs.stackexchange.com/q/90052/189607
    • https://stackoverflow.com/q/4540600/1314762

donmccurdy avatar Nov 12 '25 03:11 donmccurdy

Resolved with a depth parameter. For purposes of dedup(), this requires some care when choosing how deeply to compare objects, but can be handled automatically.

Next steps:

  • test that hash collision ratio is reasonable / small
  • update benchmarks
  • final tests and merge PR stack

donmccurdy avatar Dec 05 '25 03:12 donmccurdy