rrweb
rrweb copied to clipboard
text masking settings apply to inputs
- inputs respect text masking options in addition to input masking options
- fixes
maskInputFn
not passed to initial snapshot - fixes #874
- adds a fast return from needMaskingText if maskTextSelector is
*
Hi @mdellanoce, thanks for creating this Pull Request, I think this is a great idea to help give people more privacy and this should definitely be included in rrweb.
I did notice one potential issue with this implementation: the needMaskingText
function has been a performance problem in the past because it gets called on every single node that gets recorded (sometimes tens-of-thousands or even hundreds of thousands of nodes get recorded at once), and because the new needMaskingText
now compares distances for both masked and unmasked parents its doing twice as much work.
Maybe you could implement some escape hatches to simplify the work it has to do, for example if the is no masked parent, then there is no need to figure out what the unmasked distance is.
Thanks again for submitting this, I think if we can get this performant this would be a great addition to rrweb!
@Mark-Fenng @Juice10 thanks for the feedback, totally understand the performance concerns. I think the maskAllText suggestion will work, and I have a few other ideas. I'll see what I can do and hopefully ping you back in a few days.
🦋 Changeset detected
Latest commit: b1a1922e59a123fdeb7bf381df0fd2cd6c3719dd
The changes in this PR will be included in the next version bump.
This PR includes changesets to release 8 packages
Name | Type |
---|---|
rrweb-snapshot | Patch |
rrweb | Patch |
rrdom | Patch |
rrdom-nodejs | Patch |
rrweb-player | Patch |
@rrweb/types | Patch |
@rrweb/web-extension | Patch |
rrvideo | Patch |
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
@YunFeng0817 @Juice10 sorry for the delay, but I added 3 commits to address performance:
- if maskAllText is true, and unmasking either is not configured (best case) or fails to find a match (worst case), then the masking distance is not computed, and needsMaskingText returns true early
- I removed the default for
unmaskTextClass
, so that unmasking overhead won't "surprise" anyone with existing setups until they opt-in to use it. - I reworked the distance computation so that it is a single walk from the element to the root node. Each step will check for class match first, then check selectors. Also, in the masking distance computation, if the computed distance at a given step exceeds what was found for the unmasking distance, it'll stop walking upwards immediately and return a non-match.
Hope that all makes sense. Let me know what you think.
@mdellanoce This PR is coming along really nicely. Since performance is such an important part of this it probably makes sense to re-run the benchmarks, and make some benchmarks of our own to understand the performance of this.
Check out this PR https://github.com/rrweb-io/rrweb/pull/903, it should give you a handy guideline. Specifically checkout packages/rrweb/test/benchmark/dom-mutation.test.ts
it has the contents of these benchmarks
@Juice10 I added 2 new benchmarks for masking and unmasking.
Here are the results from master:
Note: the unmasking test on master is effectively the same as the masking test, since unmasking is unsupported there.
Here are the results from my branch:
~(commits with changes to follow soon, currently in rebase hell)~ rebased
@Juice10 ready for another look, i think
@billyvg i synced the branch with master, i'm looking into the unmasking issue you pointed out
So here's a case with textareas that will cause a bug: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-a414b2006f5f91cc65386e7241dbe659dac2ab113e09439483cc817373104a28R11120-R11145
test here: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-185b8ff868973ccd68a879036d1e5f5abcd83ce0a908c833015b204222676214R650-R673
this also looks wrong: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-a414b2006f5f91cc65386e7241dbe659dac2ab113e09439483cc817373104a28R15335-R15352
@billyvg
- okay, I see this, I must be missing some case for textarea, b/c it looks like it masked the value attribute, but not the textContent. Will fix that up.
- This one is because maskAllInputs has precedence over the text masking settings. We decided to give the existing input masking options (maskAllInputs and maskInputOptions) highest priority here, so our customers don't end up accidentally unmasking something sensitive. Maybe I just need to update the docs to be more explicit about the masking precedence?
@billyvg
- okay, I see this, I must be missing some case for textarea, b/c it looks like it masked the value attribute, but not the textContent. Will fix that up. What I did on my fork was strip
textContent
completely from textarea, but my implementation ofmaskAllText
was different because it was independent ofmaskAllInputs
. So you could have maskAllText != maskAllInputs, and textareas could have only one ofvalue
/textContent
masked, and the other unmasked. Not sure if this is the case with your impl ofmaskAllText
- This one is because maskAllInputs has precedence over the text masking settings. We decided to give the existing input masking options (maskAllInputs and maskInputOptions) highest priority here, so our customers don't end up accidentally unmasking something sensitive. Maybe I just need to update the docs to be more explicit about the masking precedence?
Ah I see, that makes sense then, i was purely going from what I saw in the test snapshots (the snapshots make the tests hard to reason about). Documenting the masking precedence would definitely be helpful!
but my implementation of maskAllText was different because it was independent of maskAllInputs. So you could have maskAllText != maskAllInputs, and textareas could have only one of value/textContent masked, and the other unmasked. Not sure if this is the case with your impl of maskAllText
Yeah, that's what is happening here too, as far as I can tell. The test sets maskAllInputs to true, which masks the value of the textarea, but the textContent is not masked because that is considered "not an input" and maskAllText is not set to true. Definitely seems incorrect.
@billyvg fixed the textarea masking. it can still be a little wonky with the default masking functions, b/c one (maskTextFn) excludes whitespace from masking, while the other (maskInputFn) masks every character regardless.
@billyvg fixed the textarea masking. it can still be a little wonky with the default masking functions, b/c one (maskTextFn) excludes whitespace from masking, while the other (maskInputFn) masks every character regardless.
Should we have a default masking regex/function?
@Juice10 @YunFeng0817 Can you take another look when you all get the chance?
@Juice10 @YunFeng0817 something I've been playing with in a different branch:
https://github.com/pendo-io/rrweb/commit/6516f547e9b6e9d7a614305f5f87eea47f1547ce
I noticed when taking a snapshot or processing a large mutation, we'd be evaluating masking selectors for the same nodes over and over. This commit caches those results temporarily to avoid re-evaluating the selectors when the result for a given node is known already. In my testing, it seems to provide a minor improvement (~10ms faster for a 100ms snapshot). Thought it might provide a bigger boost, so that was a bit of a disappointment, though I've seen it be quite a bit faster in certain configurations.
Not sure my approach here is the best/cleanest, but figured I'd throw it out there in case performance concerns are still a sticking point on this PR.
I haven't read this PR in detail, but one idea could be that maskDistance
be passed down the call chain (and incremented at each level) like the boolean needsMask
is passed down in #1349 (✓that is merged to trunk now)
although some sort of distanceToMatch
would likely still be needed for checking whether a mask is needed in a newly added node from a mutation
So https://github.com/rrweb-io/rrweb/pull/1349 is merged now which I wrote before being aware of this PR; it fixes performance problems with the prior approach namely the constant looking back up the tree when recursing during full snapshot. The approach from that PR should render the need for caching moot, as each node should only be checked once (with the exception of mutations, where you always need to look back up the tree after insertion)
I've tried to rebase this PR based on that, but the merge is non-trivial, but I might be able to do it given half a day.
Possibly it needs a new approach now:
- having a boolean variable which controls mask/unmask passing down the tree might now do the job of distance calculation (in #1349 we can stop further checking after this is found to be true, as we don't have to consider the 'unmask' case)
- for mutations, where both mask & unmask are set, we could do
closestParent = el.closest(maskSelector + ', ' + unmaskSelector)
which will might be good enough to find the closest matching either one, at which point we could do closestParent.matches(maskSelector) vs. closestParent.matches(unmaskSelector)
Another option as a first step would be to disallow maskTextClass/Selector
when maskAllText
is on, in order to avoid any distance calculation.
(Also, it might be nice to first merge the maskTextClass/Selector
options for simplicity, and/or combine all into a single 'maskTextOptions' object similar to MaskInputOptions)
@eoghanmurray i'll work on rebasing and modifying the approach
If you wish to contribute the fix for #874 as a separate PR that might make review easier, we should be able to fasttrack that.
@eoghanmurray i rebased and updated this PR to only address #874. I removed all the new unmasking logic, since I can apply that through the maskText/InputFn callbacks now that they receive the element as a parameter.
I also added a check for maskTextSelector == '*'
to return true faster from needMaskingText
without needing to run matches or closest.
@eoghanmurray it has been awhile since we commented on this. This PR has been cut down to just address #874 , could we get another look here?