rrweb icon indicating copy to clipboard operation
rrweb copied to clipboard

text masking settings apply to inputs

Open mdellanoce opened this issue 2 years ago • 21 comments

  • inputs respect text masking options in addition to input masking options
  • fixes maskInputFn not passed to initial snapshot
  • fixes #874
  • adds a fast return from needMaskingText if maskTextSelector is *

mdellanoce avatar Jan 17 '23 15:01 mdellanoce

Hi @mdellanoce, thanks for creating this Pull Request, I think this is a great idea to help give people more privacy and this should definitely be included in rrweb.

I did notice one potential issue with this implementation: the needMaskingText function has been a performance problem in the past because it gets called on every single node that gets recorded (sometimes tens-of-thousands or even hundreds of thousands of nodes get recorded at once), and because the new needMaskingText now compares distances for both masked and unmasked parents its doing twice as much work. Maybe you could implement some escape hatches to simplify the work it has to do, for example if the is no masked parent, then there is no need to figure out what the unmasked distance is.

Thanks again for submitting this, I think if we can get this performant this would be a great addition to rrweb!

Juice10 avatar Feb 03 '23 11:02 Juice10

@Mark-Fenng @Juice10 thanks for the feedback, totally understand the performance concerns. I think the maskAllText suggestion will work, and I have a few other ideas. I'll see what I can do and hopefully ping you back in a few days.

mdellanoce avatar Feb 03 '23 13:02 mdellanoce

🦋 Changeset detected

Latest commit: b1a1922e59a123fdeb7bf381df0fd2cd6c3719dd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 8 packages
Name Type
rrweb-snapshot Patch
rrweb Patch
rrdom Patch
rrdom-nodejs Patch
rrweb-player Patch
@rrweb/types Patch
@rrweb/web-extension Patch
rrvideo Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

changeset-bot[bot] avatar Feb 13 '23 19:02 changeset-bot[bot]

@YunFeng0817 @Juice10 sorry for the delay, but I added 3 commits to address performance:

  1. if maskAllText is true, and unmasking either is not configured (best case) or fails to find a match (worst case), then the masking distance is not computed, and needsMaskingText returns true early
  2. I removed the default for unmaskTextClass, so that unmasking overhead won't "surprise" anyone with existing setups until they opt-in to use it.
  3. I reworked the distance computation so that it is a single walk from the element to the root node. Each step will check for class match first, then check selectors. Also, in the masking distance computation, if the computed distance at a given step exceeds what was found for the unmasking distance, it'll stop walking upwards immediately and return a non-match.

Hope that all makes sense. Let me know what you think.

mdellanoce avatar Feb 13 '23 21:02 mdellanoce

@mdellanoce This PR is coming along really nicely. Since performance is such an important part of this it probably makes sense to re-run the benchmarks, and make some benchmarks of our own to understand the performance of this. Check out this PR https://github.com/rrweb-io/rrweb/pull/903, it should give you a handy guideline. Specifically checkout packages/rrweb/test/benchmark/dom-mutation.test.ts it has the contents of these benchmarks

Juice10 avatar Mar 15 '23 07:03 Juice10

@Juice10 I added 2 new benchmarks for masking and unmasking.

Here are the results from master:

image image image image

Note: the unmasking test on master is effectively the same as the masking test, since unmasking is unsupported there.

Here are the results from my branch:

image image image image

~(commits with changes to follow soon, currently in rebase hell)~ rebased

mdellanoce avatar Apr 10 '23 16:04 mdellanoce

@Juice10 ready for another look, i think

mdellanoce avatar Apr 10 '23 17:04 mdellanoce

@billyvg i synced the branch with master, i'm looking into the unmasking issue you pointed out

mdellanoce avatar Jul 13 '23 14:07 mdellanoce

So here's a case with textareas that will cause a bug: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-a414b2006f5f91cc65386e7241dbe659dac2ab113e09439483cc817373104a28R11120-R11145

test here: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-185b8ff868973ccd68a879036d1e5f5abcd83ce0a908c833015b204222676214R650-R673


this also looks wrong: https://github.com/pendo-io/rrweb/compare/md-mask-enhancements-1096..getsentry:rrweb:add-test-for-dynamically-added-inputs#diff-a414b2006f5f91cc65386e7241dbe659dac2ab113e09439483cc817373104a28R15335-R15352

billyvg avatar Jul 13 '23 21:07 billyvg

@billyvg

  1. okay, I see this, I must be missing some case for textarea, b/c it looks like it masked the value attribute, but not the textContent. Will fix that up.
  2. This one is because maskAllInputs has precedence over the text masking settings. We decided to give the existing input masking options (maskAllInputs and maskInputOptions) highest priority here, so our customers don't end up accidentally unmasking something sensitive. Maybe I just need to update the docs to be more explicit about the masking precedence?

mdellanoce avatar Jul 14 '23 15:07 mdellanoce

@billyvg

  1. okay, I see this, I must be missing some case for textarea, b/c it looks like it masked the value attribute, but not the textContent. Will fix that up. What I did on my fork was strip textContent completely from textarea, but my implementation of maskAllText was different because it was independent of maskAllInputs. So you could have maskAllText != maskAllInputs, and textareas could have only one of value/textContent masked, and the other unmasked. Not sure if this is the case with your impl of maskAllText
  1. This one is because maskAllInputs has precedence over the text masking settings. We decided to give the existing input masking options (maskAllInputs and maskInputOptions) highest priority here, so our customers don't end up accidentally unmasking something sensitive. Maybe I just need to update the docs to be more explicit about the masking precedence?

Ah I see, that makes sense then, i was purely going from what I saw in the test snapshots (the snapshots make the tests hard to reason about). Documenting the masking precedence would definitely be helpful!

billyvg avatar Jul 14 '23 16:07 billyvg

but my implementation of maskAllText was different because it was independent of maskAllInputs. So you could have maskAllText != maskAllInputs, and textareas could have only one of value/textContent masked, and the other unmasked. Not sure if this is the case with your impl of maskAllText

Yeah, that's what is happening here too, as far as I can tell. The test sets maskAllInputs to true, which masks the value of the textarea, but the textContent is not masked because that is considered "not an input" and maskAllText is not set to true. Definitely seems incorrect.

mdellanoce avatar Jul 14 '23 18:07 mdellanoce

@billyvg fixed the textarea masking. it can still be a little wonky with the default masking functions, b/c one (maskTextFn) excludes whitespace from masking, while the other (maskInputFn) masks every character regardless.

mdellanoce avatar Jul 17 '23 18:07 mdellanoce

@billyvg fixed the textarea masking. it can still be a little wonky with the default masking functions, b/c one (maskTextFn) excludes whitespace from masking, while the other (maskInputFn) masks every character regardless.

Should we have a default masking regex/function?

@Juice10 @YunFeng0817 Can you take another look when you all get the chance?

billyvg avatar Jul 17 '23 21:07 billyvg

@Juice10 @YunFeng0817 something I've been playing with in a different branch:

https://github.com/pendo-io/rrweb/commit/6516f547e9b6e9d7a614305f5f87eea47f1547ce

I noticed when taking a snapshot or processing a large mutation, we'd be evaluating masking selectors for the same nodes over and over. This commit caches those results temporarily to avoid re-evaluating the selectors when the result for a given node is known already. In my testing, it seems to provide a minor improvement (~10ms faster for a 100ms snapshot). Thought it might provide a bigger boost, so that was a bit of a disappointment, though I've seen it be quite a bit faster in certain configurations.

Not sure my approach here is the best/cleanest, but figured I'd throw it out there in case performance concerns are still a sticking point on this PR.

mdellanoce avatar Aug 25 '23 16:08 mdellanoce

I haven't read this PR in detail, but one idea could be that maskDistance be passed down the call chain (and incremented at each level) like the boolean needsMask is passed down in #1349 (✓that is merged to trunk now)


although some sort of distanceToMatch would likely still be needed for checking whether a mask is needed in a newly added node from a mutation

eoghanmurray avatar Nov 17 '23 17:11 eoghanmurray

So https://github.com/rrweb-io/rrweb/pull/1349 is merged now which I wrote before being aware of this PR; it fixes performance problems with the prior approach namely the constant looking back up the tree when recursing during full snapshot. The approach from that PR should render the need for caching moot, as each node should only be checked once (with the exception of mutations, where you always need to look back up the tree after insertion)

I've tried to rebase this PR based on that, but the merge is non-trivial, but I might be able to do it given half a day.

Possibly it needs a new approach now:

  • having a boolean variable which controls mask/unmask passing down the tree might now do the job of distance calculation (in #1349 we can stop further checking after this is found to be true, as we don't have to consider the 'unmask' case)
  • for mutations, where both mask & unmask are set, we could do closestParent = el.closest(maskSelector + ', ' + unmaskSelector) which will might be good enough to find the closest matching either one, at which point we could do closestParent.matches(maskSelector) vs. closestParent.matches(unmaskSelector)

Another option as a first step would be to disallow maskTextClass/Selector when maskAllText is on, in order to avoid any distance calculation.

(Also, it might be nice to first merge the maskTextClass/Selector options for simplicity, and/or combine all into a single 'maskTextOptions' object similar to MaskInputOptions)

eoghanmurray avatar Nov 24 '23 16:11 eoghanmurray

@eoghanmurray i'll work on rebasing and modifying the approach

mdellanoce avatar Nov 27 '23 14:11 mdellanoce

If you wish to contribute the fix for #874 as a separate PR that might make review easier, we should be able to fasttrack that.

eoghanmurray avatar Nov 28 '23 10:11 eoghanmurray

@eoghanmurray i rebased and updated this PR to only address #874. I removed all the new unmasking logic, since I can apply that through the maskText/InputFn callbacks now that they receive the element as a parameter.

I also added a check for maskTextSelector == '*' to return true faster from needMaskingText without needing to run matches or closest.

mdellanoce avatar Dec 08 '23 14:12 mdellanoce

@eoghanmurray it has been awhile since we commented on this. This PR has been cut down to just address #874 , could we get another look here?

colingm avatar May 01 '24 17:05 colingm