[Improvement-17670][Worker-monitoring] Add disk usage monitoring for data.basedir.path directory
Purpose of the pull request
close #17670
Brief change log
Feature Enhancement Added disk usage monitoring for the data.basedir.path directory Added dataBasedirPathDiskUsagePercentage field in Worker heartbeat data Added display of dataBasedir disk usage on the frontend monitoring page Added internationalization support (Chinese and English) Implemented load protection based on disk usage of the data.basedir.path directory Added maxDataBasedirDiskUsagePercentageThresholds configuration item in BaseServerLoadProtectionConfig Implemented disk usage check logic for the dataBasedir path in BaseServerLoadProtection Added max-data-basedir-disk-usage-percentage-thresholds configuration option in Worker config files Configuration Updates Kubernetes Deployment Configuration Added description of the environment variable WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS in README.md Added corresponding configuration items in values.yaml Docker Deployment Configuration Added WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS configuration in all test docker-compose.yaml files UI Improvements Adjusted layout of the Worker monitoring page Added data directory disk usage metric; increased number of icons per row from 4 to 5, ensuring all monitoring metrics are displayed on the same line These changes enhance DolphinScheduler's disk monitoring capabilities by providing fine-grained monitoring and overload protection for the data.basedir.path directory, helping prevent service issues caused by insufficient disk space.
Verify this pull request
This pull request is already covered by existing tests, such as WorkerServerLoadProtectionTest.
Pull Request Notice
If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md
Hi @SbloodyS @ruanwenjun @EricGao888 I've addressed the CI issues reported earlier (Updated docker-compose.yaml configurations for the new WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS setting, Resolved code style and CodeQL warnings: field masks in config classes). Could you please review the changes and let me know if anything else needs adjustment? Thanks!
Hi @SbloodyS @ruanwenjun @EricGao888 I've addressed the CI issues reported earlier (Updated docker-compose.yaml configurations for the new WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS setting, Resolved code style and CodeQL warnings: field masks in config classes). Could you please review the changes and let me know if anything else needs adjustment? Thanks!
The newest build error (No plugin found for prefix 'sonar') is from a missing SonarQube plugin in CI—unrelated to my PR changes.
Adding
max-data-basedir-disk-usage-percentage-thresholdswill conflict with the currentmax-disk-usage-percentage-thresholds, which will make it more difficult for users to understand.I think we should configure multiple directories in the following two ways 1.
max-disk-usage-percentage-thresholds: /data1: 0.8 /data2: 0.9max-disk-usage-percentage-thresholds: path: /data1,/data2 percentage: 0.9This needs to be discussed. cc @ruanwenjun @zhongjiajie @Gallardot
Thank you for your suggestion! I understand your concerns about potential configuration conflicts. To maintain backward compatibility and reduce the burden on users to manually specify the Worker’s deployment directory, would the following approach work?
server-load-protection: max-disk-usage-percentage-thresholds: 0.8 # Continue monitoring the Worker's deployment directory (backward compatible) additional-disk-paths: # Optional: monitor additional directories //tmp/dolphinscheduler: 0.9 /var/log: 0.85 Benefits of This Approach Full backward compatibility: Existing configurations like max-disk-usage-percentage-thresholds: 0.8 will keep working as before, automatically applying to the Worker’s deployment directory. User-friendly: Users don’t need to know or configure the exact deployment path—the system handles it automatically. No frontend changes required: The UI can continue displaying disk usage for the Worker’s deployment directory without modification.Avoid overcomplicating the UI. Extensible: When needed, users can optionally define additional paths to monitor via additional-disk-paths.
What do you think of this proposal? @SbloodyS @ruanwenjun @zhongjiajie @Gallardot
@dill21yu It’s preferable to retain the existing configuration key max-disk-usage-percentage-thresholds, but mark it as deprecated in the documentation.
Introduce a new configuration:
max-disk-usage-percentage-thresholds-rules:
- disk-path: /dev1
usage-percentage-thresholds: 0.9
- disk-path: /dev2
usage-percentage-thresholds: 0.8
When the old configuration max-disk-usage-percentage-thresholds is used, we should log a warning indicating that it is deprecated and recommend switching to the new max-disk-usage-percentage-thresholds-rules configuration.
