Session Replay - Snapshot mode v2
Description
Problem
The current snapshot mode implementation for Session Replay involves: 1. Capturing a full-window screenshot. 2. Traversing the view hierarchy to locate maskable elements and record their global coordinates and size. 3. Overlaying black masks on top of the captured screenshot based on these coordinates.
While effective in many cases, this approach has a few limitations:
- Occlusion Ambiguity: It’s difficult to determine whether a view is fully or partially visible due to the absence of a concrete z-index system. E.g., a modal may partially cover a view, but be semi-transparent.
- Transition Glitches: UI transitions are problematic for snapshot consistency. Our current workaround disables snapshots during transitions, which can lead to gaps in the session replay.
- Syncing Issues: Rapid UI changes (e.g., fast scrolling) can desync the base screenshot and the masking overlays. Attempting to resolve this will likely block the main thread even longer, which we want to avoid.
⸻
Proposal: Snapshot v2 – View-by-View Compositing
A more reliable and flexible snapshot mode would:
- Traverse the full view hierarchy (as currently done to find maskable views anyway).
- For each (sub)view:
- Capture a snapshot of the actual view or a masked counterpart if the view is sensitive (e.g Text *****)
- Composite each snapshot into a shared context/canvas, positioning based on global coordinates.
- Output a single image representing the full window snapshot.
Benefits
✅ Improved Visual Accuracy: Since views are drawn in hierarchy order, this will naturally account for occlusion, z-order, and partial visibility. ✅ No Sensitive Data Leaks: Masked views are never captured unmasked, preventing accidental exposure of text/images. ✅ Better Transition Handling: Layered drawing helps avoid the need to pause snapshotting during transitions. ✅ More Robust Scrolling Support: View-specific snapshots are less prone to visual desync.
Risks & Considerations
- Performance: Needs profiling – compositing many views may prove to be slower or heavier on memory?
- Edge Cases: Views using custom rendering (e.g. OpenGL layers, blur effects) might not snapshot properly
Tasks
- [x] Create a proof-of-concept for compositing snapshots via the view hierarchy.
- [ ] Benchmark performance across key devices (low-end, mid-range, high-end).
- [ ] Validate masking logic and ensure nothing leaks through in edge cases.
- [ ] Test against fast transitions and scroll-heavy views.
Proof of concept
There's a proof of concept in this branch here
Performance: Needs profiling – compositing many views may prove to be slower or heavier on memory?
yep, I ran some tests and ended up with a higher memory footprint, but I didn't try to optimize, and the current version was already working fine (with the side effects that are known now)