self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

Server-side data scrubbing replaces Java object identifiers with [email]

Open chylek-qr opened this issue 2 weeks ago • 5 comments

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

Steps to Reproduce

Send a report with the following message, which is output by java.lang.management.ThreadInfo#toString and includes identifiers of Java objects that are currently locked. This is critical information in our deadlock reports.

"DeadlockSimulator1" prio=5 Id=30 BLOCKED on java.lang.Object@4b1c1ea0 owned by "DeadlockSimulator2" Id=31
	at app//DeadlockNotifier.lambda$simulateDeadlock$0(DeadlockNotifier.java:130)
	-  blocked on java.lang.Object@4b1c1ea0
	-  locked java.lang.Object@74650e52
	at app//DeadlockNotifier$$Lambda/0x00000e0001047a10.run(Unknown Source)
	at java.base@25/java.lang.Thread.runWith(Thread.java:1487)
	at java.base@25/java.lang.Thread.run(Thread.java:1474)

Expected Result

Leave the message as-is because it contains no sensitive data.

Actual Result

Sentry replaces the Java object identifiers (and also module identifiers in the stack trace) with [email], and now I can't tell what the locked objects are.

"DeadlockSimulator1" prio=5 Id=30 BLOCKED on [email] owned by "DeadlockSimulator2" Id=31
	at app//DeadlockNotifier.lambda$simulateDeadlock$0(DeadlockNotifier.java:130)
	-  blocked on [email]
	-  locked [email]
	at app//DeadlockNotifier$$Lambda/0x00000e0001047a10.run(Unknown Source)
	at [email]/java.lang.Thread.runWith(Thread.java:1487)
	at [email]/java.lang.Thread.run(Thread.java:1474)
Image

There appears to be no way to configure the default scrubbing, to for example disable only email scrubbing, so the only option appears to be to disable scrubbing entirely.

Product Area

Unknown

Link

No response

DSN

No response

Version

25.8.0

chylek-qr avatar Dec 05 '25 04:12 chylek-qr

ENG-6055

linear[bot] avatar Dec 05 '25 04:12 linear[bot]

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] avatar Dec 05 '25 04:12 getsantry[bot]

This is not a bug, this is intended feature. Since... probably August 2025? Sentry has been redacting PII data from the SDK and on Relay. The documentation is here: https://docs.sentry.io/platforms/java/data-management/sensitive-data/

You should set sendDefaultPii to true on your Sentry configuration: https://docs.sentry.io/platforms/java/configuration/options/#sendDefaultPii

aldy505 avatar Dec 07 '25 05:12 aldy505

@adinauer I believe the Java SDK (or perhaps Relay?) mistakenly redact java.lang.Object@4b1c1ea0 as email.

aldy505 avatar Dec 07 '25 05:12 aldy505

You should set sendDefaultPii to true on your Sentry configuration.

In the fairly old version of the Java SDK we're using (6.29.0), the only places where SentryOptions.isSendDefaultPii is used, are related to some User data and FileIOSpanManager, which are not related toSentryEvent messages. I don't see how the Java SDK could be redacting the message.

chylek-qr avatar Dec 07 '25 09:12 chylek-qr

Just tested and the SDK sends the String as is. Relay has a regex for email, but I'm not sure how you could override it to e.g. only consider it an email address if it includes a TLD. I'll ask internally if there's a way. Sorry for the inconvenience.

adinauer avatar Dec 15 '25 10:12 adinauer

@chylek-qr there's a couple ways this could be solved:

  1. Send the Thread status as a separate field that isn't scrubbed by default:
      SentryEvent event = new SentryEvent();
      Message message = new Message();
      message.setMessage("this will be scrubbed");
      event.setMessage(message);
      event.setExtra("thread-status", "this data was not scrubbed during my testing");
      Sentry.captureEvent(event);
  1. While the team thought about making it possible to disable specific regex/rules, it's a big effort and will not happen anytime soon. There's no concrete plan for this at the moment.
  2. We could in theory update the filter regex to expect a . in the domain part of the email address. While it might just be internal email addresses that are not filtered by this regex, we're not certain about this and thus might risk leaking PII that is currently being filtered. Since we don't have the unfiltered data we might need to first put some stats in place to learn the impact of this change before going through with it.
  3. As another workaround you could add the affected field to "Safe Fields" and then re-add rules for filtering on that field under "Advanced Data Scrubbing". You'd want to add a rule for each data type. Unfortunately I wasn't able to get this to work for the message field. Looks like there's some special handling in place for it. Tried adding message and logentry.formatted under safe fields, then sent in an event and it was still filtered.
  4. As a different workaround you could disable default scrubbers completely and re-add them manually. I wouldn't recommend this option.

Does option 1 work for you?

adinauer avatar Dec 17 '25 11:12 adinauer

The message containing the thread data is quite large, possibly hundreds of lines, logically it makes the most sense for it to be in the message. I already disabled default scrubbers as the easiest solution. If in the future, it becomes possible to reconfigure or completely disable email scrubbing, which we don't need anyway, I will re-enable the other default scrubbers.

chylek-qr avatar Dec 17 '25 12:12 chylek-qr