proposals icon indicating copy to clipboard operation
proposals copied to clipboard

A faster, parallelizable querySelectorAll

Open LifeIsStrange opened this issue 3 years ago • 7 comments

Speeding up DOM accesses would eliminate bottlenecks for a wide range of applications. I recently discovered that the spec mandate querySelectorAll() to return elements in the document order. This behavior is useful and is a nice default. However, if I am not mistaken, the algorithm used for implementing a querySelectorAll is in essence a recursive (DOM) tree search and while regular Breadth first search or depth first search or other hybrid of a tree search can be easily parallelized/multithreaded, the constraint of retrieving elements in document order either:

  1. Mostly prevents parallelism.
  2. induce slowdowns through lockings/syncs and efforts in maintaining and joining in order during the search.

I understand that web browsers have clever implementations but it seems to me that the constraint or retrieving elements in document order, inherently limits parallelism/performance to some extent. Therefore it seems to make sense to create a variant, named for example querySelectorAllUnordered() that would behave exactly the same as querySelectorAll, except that the order constraint would be alleviated, hence allowing implementors to have a faster, more parallel implementation of the tree search. This has the potential to make the web platform faster and seems like a low hanging fruit.

LifeIsStrange avatar Jan 30 '22 23:01 LifeIsStrange

@mfreed7 - do you think that this would be a interesting path to optimize querySelectAll further?

yoavweiss avatar May 29 '24 07:05 yoavweiss

I do think it's an interesting idea, but I would debate the "low hanging fruit" part of the rationale. Most DOM APIs (querySelectorAll included) are implemented in C++ single-threaded. There are a few very-hot code paths where we do some sort of threaded optimization, but there's a lot of overhead and risk in adding threading for things like this.

Also note that querySelector is highly optimized already in all engines, with a lot of caching. I'd guess it'd take some effort to make that caching thread-safe enough to gain an advantage here.

So while this type of thing is an interesting idea, I'm not sure I'd put it near the top of my list of optimization ideas.

mfreed7 avatar May 31 '24 00:05 mfreed7

We also discussed this on the WebKit side with our DOM folks, and came to similar conclusions to @mfreed7. The presumption is that the DOM is thread safe, which in many cases it is not, so having to make it thread safe would mean adding locks, potentially impacting the performance of things like DOM mutations.

What's might also happen is that querySelectorAllUnordered() might consistently return ordered results in one engine, and developers (despite it being explicitly named "Unordered") end up relying on the order. When running the same query on another engine, then things might break for them in fun and unexpected ways.

NB: this happened to Gecko, IIRC, where attribute order was unordered, and they had to switch it to ordered for web compat reasons.

A better path might be to see if browsers can squeeze more performance out of querySelectorAll. As @mfreed7 mentioned, implementations are highly optimized and do a lot of caching already, but there's potentially more things browsers could do.

@LifeIsStrange, there's alternative proposals like, selection observers that might also be part of the solution here.

@LifeIsStrange, given the responses above, are you ok if we close this proposal?

marcoscaceres avatar Jun 14 '24 06:06 marcoscaceres

Consider returning an iterator instead of NodeList.

This would allow for early returns and provide ample time for prefetching the next element.

for (const child of ele.querySelectors()) {
  // get a child and perform some work, which typically consumes some JS time.
  // While JS time is being processed, querySelectors continues to work, preparing for the next element's return.
  // This allows multiple concurrent querySelectors to better utilize CPU time through interleaving optimization.
}

Gaubee avatar Jun 18 '24 01:06 Gaubee

  // get a child and perform some work, which typically consumes some JS time.
  // While JS time is being processed, querySelectors continues to work, preparing for the next element's return.

This is the problem right here. Those both currently happen on the same thread. And the first can also affect the result of the second.

mfreed7 avatar Jun 18 '24 02:06 mfreed7

  // get a child and perform some work, which typically consumes some JS time.
  // While JS time is being processed, querySelectors continues to work, preparing for the next element's return.

This is the problem right here. Those both currently happen on the same thread. And the first can also affect the result of the second.

You are right, but it's not an unsolvable problem, depending on how we define it.

Typically, our ultimate goal is to align with user intuition, similar to the iteration of JavaScript's map and set.

Therefore, we can refer to the intent of the following pseudocode:

// source
for (const child of ele.querySelectors(selector)) {

}
// profill
let visitedNodes = new Array<Element>();
while (true) {
  const pre = visitedNodes.findLast((node) => ele.contains(node));
  const next = queryNext(pre);
  if (next === undefined) {
    return;
  }
  visitedNodes.push(next);
  yield next;
}

// queryNext
function queryNext(start?: Element) {
  const all = getAll();
  if (start) {
    return all[all.indexOf(start) + 1];
  }
  return all[0];
}

// getAllWithCache
let cache: NodeList<Element> | undefined;
function getAll() {
  return (cache ??= ele.querySelectorAll(selector));
}
const observer = new MutationObserver(() => {
  cache = undefined; // Clean cache
});
observer.observe(ele, { attributes: true, childList: true, subtree: true });

Gaubee avatar Jun 18 '24 03:06 Gaubee

  // get a child and perform some work, which typically consumes some JS time.
  // While JS time is being processed, querySelectors continues to work, preparing for the next element's return.

This is the problem right here. Those both currently happen on the same thread. And the first can also affect the result of the second.

profill v2

HTMLElement.prototype.querySelectors = function* (selector) {
  const querySessionId = "x-" + crypto.randomUUID();

  const observer = new MutationObserver(() => {
    prepare = undefined; // Clean cache
  });
  observer.observe(this, {
    attributes: true,
    childList: true,
    subtree: true,
  });
  let prepare;
  let hasTask = false;
  const visited = [];
  try {
    while (true) {
      const next =
        prepare || this.querySelector(selector + `:not([${querySessionId}])`); // use pre-query or query next
      prepare = undefined; // consume cache
      if (next === null) {
        return;
      }

      // in native: start a lightweight task for prepare next-query
      if (hasTask === false) {
        hasTask = true;
        queueMicrotask(() => {
          hasTask = false;
          prepare = this.querySelector(selector + `:not([${querySessionId}])`);
        });
      }

      // in native: mark as queried
      next.setAttribute(querySessionId, "");
      visited.push(next);
      yield next;
    }
  } finally {
    observer.disconnect();
    // in native: remove mark
    visited.forEach((ele) => ele.removeAttribute(querySessionId));
  }
};

Gaubee avatar Jun 18 '24 03:06 Gaubee