Currently axe-core generates unique CSS selectors for each node returned in the results. Unfortunately this selector generation is the #1 performance bottleneck in axe core, especially for large pages with a large number of DOM nodes. We've investigated trying to speed up the selectors, but our main problem is that any changes to selectors is a breaking change. So if a node produced one CSS selector before it cannot change that CSS selector.

For v5, we should stop generating CSS selectors and instead use the ancestor tree for the target value. Doing this has shown to be a huge performance gain.

Apr 29 '25 19:04 straker

I'm a bit confused; are you suggesting that we do this just as an optimization for cases that involve DOMs with very high branching factors, like your investigation in https://github.com/dequelabs/axe-core/issues/4427 was suggestion, or are you actually suggesting that we stop generating non-ancestry selectors at all by default? I very much don't think we should do the latter; they're a lot less meaningful to look at, a lot longer to have to put in result displays, and much more brittle for deduplication purposes.

Apr 30 '25 16:04 dbjorge

Non-ancestry selectors by default. The problem is that getSelectors is the slowest part of the code, and the investigation into #4427 only affects certain DOM structures (when there are >2k sibling DOM nodes). Switching to ancestry selectors is a significant speed increase in all cases, regardless of the DOM structure.

For example, #4427 uses a page with 100k nodes, so I tested the cached childIndex code on my playground with 100k nodes, and it took axe 1,997,015.82ms to complete (that's not a typo). Running the same page using only ancestry generation took 239,628.01ms. Not having to call querySelectorAll every time we want to check if a selector is unique (even with id's there's no guarantee the selector is unique) saves a significant amount of time.

For a real world test, I used https://s3.amazonaws.com/content.stockpr.com/sec/0000070858-17-000025/bac-331201710xq.htm, which has 150k nodes. Ancestry took 119,380.60ms to complete while the perf benefit from #4427 doesn't even apply (there's not enough child nodes of a single element to trigger anything. The largest node with children is the main element with 1369 children. The next closest is a table with 64 children). A normal axe run takes 204,914.79ms on that page.

Apr 30 '25 21:04 straker

I've tried investigating a different approach to generating unique selectors that still generates a meaningful selector, but they all inevitably fall back to the problem of needing to query the entire page to see if the selector is unique.

In terms of deduplication, I'm not sure that's a significant reason not to switch. Techniques such as Tailwind, which has every individual css prop as a single class so updating a style either adds or remove class name identifiers, or CSS in JS, which produces random class names every run/build, already causes problems for deduplication.

Apr 30 '25 21:04 straker

I'm skeptical about this idea. Deduplication of issues relies on these selectors. We wrote this selector algorithm in large part because it would be a more robust then deduplicating based on ancestry. It has the added benefit of also being more human readable. In most places where axe runs this is a worthwhile tradeoff in my opinion.

May 01 '25 11:05 WilcoFiers

I agree with Wilco; I think generating "good" (robust/readable) selectors is too important to trade away by default.

Techniques such as Tailwind, which has every individual css prop as a single class so updating a style either adds or remove class name identifiers, or CSS in JS, which produces random class names every run/build, already causes problems for deduplication.

This is a real problem but I don't think the solution is "stop trying". I think this does motivate strategies for improvements (you know, in our copious spare roadmap space 😄 ):

Heuristically detecting and avoiding class names that look like they're affixed with hashes (this is what antonmedv/finder does with its wordLike checks)
Preferring non-class-name features more strongly (like afloesch/css-selector)
Generating multiple possible selectors - not sure this would be too valuable for display, but this would be very useful for deduplication fingerprinting
Add customization options for selector generation (autarc/optimal-select has some examples of customizing priorities/ignore patterns/etc)

they all inevitably fall back to the problem of needing to query the entire page to see if the selector is unique.

What if we could get around this? Would it be possible for us to build up a more complete tree-aware cache of which features map to which elements such that we could guarantee uniqueness without going back and doing a QSA check?

May 01 '25 15:05 dbjorge

Here is what CoPilot says

Here’s a focused analysis of the performance of the code in lib/core/utils/get-selector.js, identifying concrete areas for improvement without changing the selectors produced:

1. Memoization is already used, but can be improved

Current state:
- The main getSelector export is already wrapped in memoize.
- findSimilar is memoized as well.
Potential improvement:
- Review the granularity and cache size of memoize—ensure it’s LRU or appropriately limited to avoid memory bloat, especially on large/long-lived pages.
- Consider memoizing other pure helper functions like escapeAttribute, getAttributeNameValue, and filterAttributes if they are called repeatedly with the same inputs (especially in large DOMs).

2. Reduce redundant array creation and iteration

Current state:

Many uses of Array.from() and .forEach() on NodeLists and class lists, e.g.:

Array.from(node.classList).forEach(...)
Array.from(getNodeAttributes(node)).filter(...).forEach(...)

Improvement:
- Avoid intermediate arrays: Use for (let ... of ...) loops directly on iterable objects (classList, NodeList, etc.) to avoid allocating new arrays.
- Example:
```
for (let cl of node.classList) { ... } // instead of Array.from().forEach()
```

3. Optimize attribute filtering and counting

Current state:
- For each node, all attributes are retrieved, filtered, and then iterated for counting.
- filterAttributes checks ignoredAttributes.includes, which is O(n) for the array.
Improvement:
- Change ignoredAttributes to a Set for O(1) lookups:
```
const ignoredAttributes = new Set([ ... ]);
...
!ignoredAttributes.has(at.name)
```
- If possible, combine the filter and count logic to avoid multiple passes.

4. Optimize tag and class counting

Current state:
- Looks up and increments counters in plain objects for tags and classes.
Improvement:
- If many different tag/class names are encountered, consider using Map instead of plain objects for slightly better performance and to avoid prototype pollution issues.

5. Short-circuit and avoid unnecessary computation

Current state:
- Some operations are performed even if results may not be used.
Improvement:
- In getSelectorData, skip class/attribute/tag counting if the node has none.
- In the selector construction, bail out early if a unique ID is found.
- In getNthChildString, avoid calling matchesSelector for all siblings if you can determine uniqueness by index.

6. Minimize repeated DOM queries

Current state:
- doc.querySelectorAll(id) is called in getElmId just to count matches.
Improvement:
- Early exit on the second match found (i.e., don’t count all, stop at 2).
  Replace:
```
doc.querySelectorAll(id).length === 1
```
  with:
```
doc.querySelector(id) && !doc.querySelector(id + ':not(:first-of-type)')
```
  Or iterate manually and stop at 2.

7. Avoid unnecessary string concatenations

Current state:
- String concatenation is used repeatedly in selector building.
Improvement:
- Use array join patterns in loops where multiple features are concatenated, though the impact is minor compared to DOM ops.

8. Batch DOM operations if possible

Current state:
- Many calls to querySelectorAll, which are expensive.
Improvement:
- When looking for similar elements, batch selectors if possible; but this may not apply if each selector is unique.

9. Throttle or debounce repeated calls

Current state:
- If the main function is called rapidly (e.g., during live inspection), consider debouncing or batching calls at a higher level.

Summary Table

Area	Current Approach	Suggested Improvement
Memoization	Some functions memoized	Memoize more pure helpers, review strategy
Array iteration	Array.from + forEach	Use for..of directly
Attribute filtering	Array + includes (O(n))	Use Set for O(1) lookup
Counting tags/classes	Object	Use Map for large/unknown keys
DOM query for IDs	querySelectorAll (all)	Early exit or minimal scan
String concatenation	Repeated +	Use arrays + join (minor)
Repeated DOM ops	Many querySelectorAll	Batch/short-circuit where possible
Throttle/debounce	Not present	Consider if usage is rapid

Implementing the above will improve performance, especially on large/complex DOMs, while preserving selector correctness.

May 23 '25 14:05 dylanb

Use ancestor selectors instead of CSS selectors for results

1. Memoization is already used, but can be improved

2. Reduce redundant array creation and iteration

3. Optimize attribute filtering and counting

4. Optimize tag and class counting

5. Short-circuit and avoid unnecessary computation

6. Minimize repeated DOM queries

7. Avoid unnecessary string concatenations

8. Batch DOM operations if possible

9. Throttle or debounce repeated calls

Summary Table