[Enhancement] Robust Snapshot Architecture: Dual-Source Accessibility, Defensive Href Extraction & Download Link Detection
Related Issues
- #363 - Accessibility tree/element(s) snapshot
- #284 - Download folder configuration for automation
- Puppeteer #6311 - URL attribute for links
Problem Statement
While working on browser automation for AI agents, we identified several reliability gaps in snapshot/extraction that affect real-world usage:
- Href extraction edge cases - Some dynamically-rendered links or SPAs don't expose
hrefreliably through the accessibility tree alone - No proactive download identification - Agents must parse snapshot text manually to find downloadable files
- Single-source accessibility - Relying solely on Puppeteer's snapshot can miss semantics (as discussed in #363)
- Snapshot fragility - Individual element failures can break the entire snapshot
Proposed Improvements
We've implemented and deployed (Azure production) solutions for these:
1. Dual-Fallback Href Extraction
// Runtime.callFunctionOn with fallback
return this.href || this.getAttribute('href') || '';
This handles edge cases where the standard property read fails (related to Puppeteer #6311 discussion).
2. Explicit downloadLinks Field
interface SnapshotResult {
// ... existing fields
downloadLinks: Array<{
url: string;
filename: string;
extension: string;
}>;
}
Automatically identifies downloadable files by extension (.csv, .xlsx, .zip, .pdf, .json, etc.). Agents no longer need to parse text manually.
3. Dual-Source Accessibility Tree
| Source | Purpose |
|---|---|
Puppeteer page.accessibility.snapshot() |
Semantic structure |
CDP backendNodeId |
Precise DOM element mapping |
This addresses the gaps @BogdanCerovac identified in #363 - combining semantic accessibility with precise DOM mapping.
4. Resilient Error Handling
// Continue on individual element failures
for (const node of nodes) {
try {
await extractNodeData(node);
} catch (e) {
console.warn(`Skipping node: ${e.message}`);
continue; // Don't fail entire snapshot
}
}
Implementation
We have a working implementation deployed in production. Happy to:
- [ ] Submit a PR with these improvements
- [ ] Provide more technical details on any specific aspect
- [ ] Discuss alternative approaches
Questions for Maintainers
- Would you prefer these as separate PRs or one consolidated change?
- For
downloadLinks- should this be opt-in via a parameter or always included? - Any concerns about the dual-source approach adding complexity?
/cc @OrKoN
Thanks for filing a feature request first. Generally, we would want to extend the snapshot with DOM data for some use cases like styling debugging, but we want to engineer it in a way so that the performance and stability does not degrade (e.g., by exposing required capabilities via CDP or using isolated worlds instead of script evaluations in the main world).
The use cases you describe though might not justify the added complexity yet. The goal of this MCP server is to aid debugging and testing of applications to ensure they work well and follow best practices. Therefore, my first suggestion would be to fix the apps in question and make the download links accessible. Second, teaching your agent to extract the links and hrefs the way you want using the generic script execution tool would be another alternative I would suggest. WDYT? cc @natorion
P.S. doing this node by node looks inefficient. We should at least pre-filter potential DOM nodes then collect data for all nodes with at most one script eval per script.
// Continue on individual element failures
for (const node of nodes) {
try {
await extractNodeData(node);
} catch (e) {
console.warn(`Skipping node: ${e.message}`);
continue; // Don't fail entire snapshot
}
}