dolphinscheduler
dolphinscheduler copied to clipboard
[Feature-10498] Mask the password in the log of sqoop task
Purpose of the pull request
Mask the password in the log of sqoop
task.
Currently, there are 2 positions that the log of sqoop
task will output the password of mysql:
https://github.com/apache/dolphinscheduler/blob/17a9dd25fa0e80b048394f79db130f56eb8ef72f/dolphinscheduler-task-plugin/dolphinscheduler-task-sqoop/src/main/java/org/apache/dolphinscheduler/plugin/task/sqoop/SqoopTask.java#L83
https://github.com/apache/dolphinscheduler/blob/17a9dd25fa0e80b048394f79db130f56eb8ef72f/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/ShellCommandExecutor.java#L116
Brief change log
Verify this pull request
Manually tested.
data:image/s3,"s3://crabby-images/5fdb9/5fdb9bcb9457eca7b0c707a392547b108b66a568" alt="截屏2022-08-22 15 30 38"
related: #10498
This is a good feature. Could you please add a UT for it? Thanks.
This is a good feature. Could you please add a UT for it? Thanks.
Hi, @EricGao888, Sorry for the late response. I've added a UT.
Codecov Report
Merging #11589 (6708c75) into dev (64a29c6) will increase coverage by
0.00%
. The diff coverage is44.44%
.
@@ Coverage Diff @@
## dev #11589 +/- ##
=========================================
Coverage 39.35% 39.36%
Complexity 4270 4270
=========================================
Files 1067 1067
Lines 40118 40121 +3
Branches 4606 4605 -1
=========================================
+ Hits 15790 15794 +4
+ Misses 22552 22547 -5
- Partials 1776 1780 +4
Impacted Files | Coverage Δ | |
---|---|---|
.../dolphinscheduler/plugin/task/sqoop/SqoopTask.java | 0.00% <0.00%> (ø) |
|
...inscheduler/common/log/SensitiveDataConverter.java | 63.15% <50.00%> (ø) |
|
...erver/master/processor/queue/TaskEventService.java | 69.64% <0.00%> (-10.72%) |
:arrow_down: |
...dolphinscheduler/remote/future/ResponseFuture.java | 81.96% <0.00%> (-1.64%) |
:arrow_down: |
...r/plugin/registry/zookeeper/ZookeeperRegistry.java | 50.00% <0.00%> (+6.45%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
May I ask whether we could do this with some other ways in
sqoop task plugin
instead of add a specific condition here forsqoop
? ShellCommandExecutor is shared across different task plugins, it might not be a good practice and uneasy to maintain if we need to change it every time we need to mask something for a task plugin. The same for line 55-56.
Hi, @EricGao888. Thanks for your suggestions. I have moved the masking logic to a common util class.
Background
There are 2 possible locations that the log may contain the sensitive data (e.g., password). Taking the sqoop task as an example:
- The log output in the
buildCommand()
inSqoopTask
.
https://github.com/apache/dolphinscheduler/blob/28469fc9b23094a74cbef83b9990d6fbced20d79/dolphinscheduler-task-plugin/dolphinscheduler-task-sqoop/src/main/java/org/apache/dolphinscheduler/plugin/task/sqoop/SqoopTask.java#L82-L84
- The log output in the
createCommandFileIfNotExists(String execCommand, String commandFile)
inShellCommandExecutor
. We need to mask the sensitive data ofexecCommand
here since theexecCommand
will be output in the log. But this class is shared by multiple task plugins and how to mask the sensitive data needs to be discussed.
https://github.com/apache/dolphinscheduler/blob/17a9dd25fa0e80b048394f79db130f56eb8ef72f/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/ShellCommandExecutor.java#L115-L116
As for me, there are 2 ways:
1. Mask the sensitive data at the point of outputting the task log and every task plugin applys the same masking logic
- Encapsulate the
logger
ofAbstractTaskExecutor
. Every time the task log is output, the output statement will be matched to see if there is any sensitive data.
https://github.com/apache/dolphinscheduler/blob/ebcffb04aad9db8ec6df1105e4770b187088e701/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/AbstractTaskExecutor.java#L37
- By default, common sensitive data will be matched, and users can customize the sensitive information that needs to be masked.
data:image/s3,"s3://crabby-images/9b55f/9b55ffd2e2f46fdc1bf95b54db104fcd61a01903" alt="image"
Note that the command after masking will only used to output in the log, so this method will not affect the actual command executed.
Reference: Sensitive data masking of Airflow
2. Each task plugin implements its own masking logic
Refactor the createCommandFileIfNotExists()
and add a param String execCommandMasking
.
https://github.com/apache/dolphinscheduler/blob/28469fc9b23094a74cbef83b9990d6fbced20d79/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/ShellCommandExecutor.java#L84-L87
Also AbstractCommandExecutor.run(execCommand)
needs to be refactored to AbstractCommandExecutor.run(execCommand, execCommandMasking)
.
https://github.com/apache/dolphinscheduler/blob/28469fc9b23094a74cbef83b9990d6fbced20d79/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/AbstractCommandExecutor.java#L176
By doing this, the process logic of how to mask the sensitive data can be done in each task plugin. But this will modify AbstractCommandExecutor
and ShellCommandExecutor
, and many task plugins need to be modified and implement their own masking logic.
Below is the list of affected classes:
data:image/s3,"s3://crabby-images/38917/38917f3e054767d23e5e957f9b722c083a61108d" alt="image"
Conclusion
I personally prefer the first way to mask the sensitive data. And I also sent a related discussion email to the mailing list.
So which way do you think is better? Or there are other better ways to do so. Any comments or suggestions are welcome!
We have use SensitiveDataConverter
for the whole log to hide the password.
We have use
SensitiveDataConverter
for the whole log to hide the password.
Thanks for your suggestion! I'll look into it.
We have use
SensitiveDataConverter
for the whole log to hide the password.
Hi, @ruanwenjun , It seems that there is a bug in SensitiveDataConverter
which will be fixed in #11459
After #11459 is fixed, maybe we can add a pattern in SensitiveDataConverter
to mask the password in sqoop
task.
Hi, @ruanwenjun , @EricGao888 , I've made changes to my PR as below:
- Use
SensitiveDataConverter
to uniformly mask sensitive information in task logs. - Each task plugin can add its own regular match expressions to
SensitiveDataConverter
throughaddMaskPattern()
. E.g.,Sqoop
task adds its own regular expression ininit()
:
SensitiveDataConverter.addMaskPattern(SqoopConstants.SQOOP_PASSWORD_REGEX);
This PR solves the problem of the mysql password in the sqoop task log. If others find that other task types will output sensitive information in the log, they only need to add their own regular expression through addMaskPattern()
in the init()
of the task plugin.
- Use
SensitiveDataConverter
to uniformly mask sensitive information in task logs.- Each task plugin can add its own regular match expressions to
SensitiveDataConverter
throughaddMaskPattern()
. E.g.,Sqoop
task adds its own regular expression ininit()
:SensitiveDataConverter.addMaskPattern(SqoopConstants.SQOOP_PASSWORD_REGEX);
This PR solves the problem of the mysql password in the sqoop task log. If others find that other task types will output sensitive information in the log, they only need to add their own regular expression through
addMaskPattern()
in theinit()
of the task plugin.
Hi, @ruanwenjun , could you please help review this? I've rebased to remove conflicts.
Hi, @ruanwenjun , @EricGao888 , I've made changes to my PR as below:
- Use
SensitiveDataConverter
to uniformly mask sensitive information in task logs.- Each task plugin can add its own regular match expressions to
SensitiveDataConverter
throughaddMaskPattern()
. E.g.,Sqoop
task adds its own regular expression ininit()
:SensitiveDataConverter.addMaskPattern(SqoopConstants.SQOOP_PASSWORD_REGEX);
This PR solves the problem of the mysql password in the sqoop task log. If others find that other task types will output sensitive information in the log, they only need to add their own regular expression through
addMaskPattern()
in theinit()
of the task plugin.
Hi, @EricGao888 @caishunfeng , could you please help review this?
Hi, @caishunfeng , I've moved SensitiveDataConverter
to dolphinscheduler-common
module.
cc @zhongjiajie