Use ancestor selectors instead of CSS selectors for results
Currently axe-core generates unique CSS selectors for each node returned in the results. Unfortunately this selector generation is the #1 performance bottleneck in axe core, especially for large pages with a large number of DOM nodes. We've investigated trying to speed up the selectors, but our main problem is that any changes to selectors is a breaking change. So if a node produced one CSS selector before it cannot change that CSS selector.
For v5, we should stop generating CSS selectors and instead use the ancestor tree for the target value. Doing this has shown to be a huge performance gain.
I'm a bit confused; are you suggesting that we do this just as an optimization for cases that involve DOMs with very high branching factors, like your investigation in https://github.com/dequelabs/axe-core/issues/4427 was suggestion, or are you actually suggesting that we stop generating non-ancestry selectors at all by default? I very much don't think we should do the latter; they're a lot less meaningful to look at, a lot longer to have to put in result displays, and much more brittle for deduplication purposes.
Non-ancestry selectors by default. The problem is that getSelectors is the slowest part of the code, and the investigation into #4427 only affects certain DOM structures (when there are >2k sibling DOM nodes). Switching to ancestry selectors is a significant speed increase in all cases, regardless of the DOM structure.
For example, #4427 uses a page with 100k nodes, so I tested the cached childIndex code on my playground with 100k nodes, and it took axe 1,997,015.82ms to complete (that's not a typo). Running the same page using only ancestry generation took 239,628.01ms. Not having to call querySelectorAll every time we want to check if a selector is unique (even with id's there's no guarantee the selector is unique) saves a significant amount of time.
For a real world test, I used https://s3.amazonaws.com/content.stockpr.com/sec/0000070858-17-000025/bac-331201710xq.htm, which has 150k nodes. Ancestry took 119,380.60ms to complete while the perf benefit from #4427 doesn't even apply (there's not enough child nodes of a single element to trigger anything. The largest node with children is the main element with 1369 children. The next closest is a table with 64 children). A normal axe run takes 204,914.79ms on that page.
I've tried investigating a different approach to generating unique selectors that still generates a meaningful selector, but they all inevitably fall back to the problem of needing to query the entire page to see if the selector is unique.
In terms of deduplication, I'm not sure that's a significant reason not to switch. Techniques such as Tailwind, which has every individual css prop as a single class so updating a style either adds or remove class name identifiers, or CSS in JS, which produces random class names every run/build, already causes problems for deduplication.
I'm skeptical about this idea. Deduplication of issues relies on these selectors. We wrote this selector algorithm in large part because it would be a more robust then deduplicating based on ancestry. It has the added benefit of also being more human readable. In most places where axe runs this is a worthwhile tradeoff in my opinion.
I agree with Wilco; I think generating "good" (robust/readable) selectors is too important to trade away by default.
Techniques such as Tailwind, which has every individual css prop as a single class so updating a style either adds or remove class name identifiers, or CSS in JS, which produces random class names every run/build, already causes problems for deduplication.
This is a real problem but I don't think the solution is "stop trying". I think this does motivate strategies for improvements (you know, in our copious spare roadmap space 😄 ):
- Heuristically detecting and avoiding class names that look like they're affixed with hashes (this is what antonmedv/finder does with its
wordLikechecks) - Preferring non-class-name features more strongly (like afloesch/css-selector)
- Generating multiple possible selectors - not sure this would be too valuable for display, but this would be very useful for deduplication fingerprinting
- Add customization options for selector generation (autarc/optimal-select has some examples of customizing priorities/ignore patterns/etc)
they all inevitably fall back to the problem of needing to query the entire page to see if the selector is unique.
What if we could get around this? Would it be possible for us to build up a more complete tree-aware cache of which features map to which elements such that we could guarantee uniqueness without going back and doing a QSA check?
Here is what CoPilot says
Here’s a focused analysis of the performance of the code in lib/core/utils/get-selector.js, identifying concrete areas for improvement without changing the selectors produced:
1. Memoization is already used, but can be improved
-
Current state:
- The main
getSelectorexport is already wrapped inmemoize. findSimilaris memoized as well.
- The main
-
Potential improvement:
- Review the granularity and cache size of
memoize—ensure it’s LRU or appropriately limited to avoid memory bloat, especially on large/long-lived pages. - Consider memoizing other pure helper functions like
escapeAttribute,getAttributeNameValue, andfilterAttributesif they are called repeatedly with the same inputs (especially in large DOMs).
- Review the granularity and cache size of
2. Reduce redundant array creation and iteration
-
Current state:
- Many uses of
Array.from()and.forEach()on NodeLists and class lists, e.g.:Array.from(node.classList).forEach(...) Array.from(getNodeAttributes(node)).filter(...).forEach(...)
- Many uses of
-
Improvement:
- Avoid intermediate arrays: Use
for (let ... of ...)loops directly on iterable objects (classList, NodeList, etc.) to avoid allocating new arrays. - Example:
for (let cl of node.classList) { ... } // instead of Array.from().forEach()
- Avoid intermediate arrays: Use
3. Optimize attribute filtering and counting
-
Current state:
- For each node, all attributes are retrieved, filtered, and then iterated for counting.
filterAttributeschecksignoredAttributes.includes, which is O(n) for the array.
-
Improvement:
- Change
ignoredAttributesto aSetfor O(1) lookups:const ignoredAttributes = new Set([ ... ]); ... !ignoredAttributes.has(at.name) - If possible, combine the filter and count logic to avoid multiple passes.
- Change
4. Optimize tag and class counting
-
Current state:
- Looks up and increments counters in plain objects for tags and classes.
-
Improvement:
- If many different tag/class names are encountered, consider using
Mapinstead of plain objects for slightly better performance and to avoid prototype pollution issues.
- If many different tag/class names are encountered, consider using
5. Short-circuit and avoid unnecessary computation
-
Current state:
- Some operations are performed even if results may not be used.
-
Improvement:
- In
getSelectorData, skip class/attribute/tag counting if the node has none. - In the selector construction, bail out early if a unique ID is found.
- In
getNthChildString, avoid callingmatchesSelectorfor all siblings if you can determine uniqueness by index.
- In
6. Minimize repeated DOM queries
-
Current state:
doc.querySelectorAll(id)is called ingetElmIdjust to count matches.
-
Improvement:
- Early exit on the second match found (i.e., don’t count all, stop at 2).
Replace:
with:doc.querySelectorAll(id).length === 1
Or iterate manually and stop at 2.doc.querySelector(id) && !doc.querySelector(id + ':not(:first-of-type)')
- Early exit on the second match found (i.e., don’t count all, stop at 2).
7. Avoid unnecessary string concatenations
-
Current state:
- String concatenation is used repeatedly in selector building.
-
Improvement:
- Use array join patterns in loops where multiple features are concatenated, though the impact is minor compared to DOM ops.
8. Batch DOM operations if possible
-
Current state:
- Many calls to
querySelectorAll, which are expensive.
- Many calls to
-
Improvement:
- When looking for similar elements, batch selectors if possible; but this may not apply if each selector is unique.
9. Throttle or debounce repeated calls
- Current state:
- If the main function is called rapidly (e.g., during live inspection), consider debouncing or batching calls at a higher level.
Summary Table
| Area | Current Approach | Suggested Improvement |
|---|---|---|
| Memoization | Some functions memoized | Memoize more pure helpers, review strategy |
| Array iteration | Array.from + forEach | Use for..of directly |
| Attribute filtering | Array + includes (O(n)) | Use Set for O(1) lookup |
| Counting tags/classes | Object | Use Map for large/unknown keys |
| DOM query for IDs | querySelectorAll (all) | Early exit or minimal scan |
| String concatenation | Repeated + | Use arrays + join (minor) |
| Repeated DOM ops | Many querySelectorAll | Batch/short-circuit where possible |
| Throttle/debounce | Not present | Consider if usage is rapid |
Implementing the above will improve performance, especially on large/complex DOMs, while preserving selector correctness.