dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Improvement-17670][Worker-monitoring] Add disk usage monitoring for data.basedir.path directory

Open dill21yu opened this issue 2 months ago • 5 comments

Purpose of the pull request

close #17670

Brief change log

Feature Enhancement Added disk usage monitoring for the data.basedir.path directory Added dataBasedirPathDiskUsagePercentage field in Worker heartbeat data Added display of dataBasedir disk usage on the frontend monitoring page Added internationalization support (Chinese and English) Implemented load protection based on disk usage of the data.basedir.path directory Added maxDataBasedirDiskUsagePercentageThresholds configuration item in BaseServerLoadProtectionConfig Implemented disk usage check logic for the dataBasedir path in BaseServerLoadProtection Added max-data-basedir-disk-usage-percentage-thresholds configuration option in Worker config files Configuration Updates Kubernetes Deployment Configuration Added description of the environment variable WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS in README.md Added corresponding configuration items in values.yaml Docker Deployment Configuration Added WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS configuration in all test docker-compose.yaml files UI Improvements Adjusted layout of the Worker monitoring page Added data directory disk usage metric; increased number of icons per row from 4 to 5, ensuring all monitoring metrics are displayed on the same line These changes enhance DolphinScheduler's disk monitoring capabilities by providing fine-grained monitoring and overload protection for the data.basedir.path directory, helping prevent service issues caused by insufficient disk space.

Verify this pull request

This pull request is already covered by existing tests, such as WorkerServerLoadProtectionTest.

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

dill21yu avatar Nov 12 '25 10:11 dill21yu

Quality Gate Failed Quality Gate failed

Failed conditions
2.7% Coverage on New Code (required ≥ 60%)

See analysis details on SonarQube Cloud

sonarqubecloud[bot] avatar Nov 18 '25 08:11 sonarqubecloud[bot]

Hi @SbloodyS @ruanwenjun @EricGao888 I've addressed the CI issues reported earlier (Updated docker-compose.yaml configurations for the new WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS setting, Resolved code style and CodeQL warnings: field masks in config classes). Could you please review the changes and let me know if anything else needs adjustment? Thanks!

dill21yu avatar Nov 24 '25 06:11 dill21yu

Hi @SbloodyS @ruanwenjun @EricGao888 I've addressed the CI issues reported earlier (Updated docker-compose.yaml configurations for the new WORKER_SERVER_LOAD_PROTECTION_MAX_DATA_BASEDIR_DISK_USAGE_PERCENTAGE_THRESHOLDS setting, Resolved code style and CodeQL warnings: field masks in config classes). Could you please review the changes and let me know if anything else needs adjustment? Thanks!

The newest build error (No plugin found for prefix 'sonar') is from a missing SonarQube plugin in CI—unrelated to my PR changes.

dill21yu avatar Nov 24 '25 07:11 dill21yu

Adding max-data-basedir-disk-usage-percentage-thresholds will conflict with the current max-disk-usage-percentage-thresholds, which will make it more difficult for users to understand.

I think we should configure multiple directories in the following two ways 1.

max-disk-usage-percentage-thresholds:
  /data1: 0.8
  /data2: 0.9
max-disk-usage-percentage-thresholds:
  path: /data1,/data2
  percentage: 0.9

This needs to be discussed. cc @ruanwenjun @zhongjiajie @Gallardot

Thank you for your suggestion! I understand your concerns about potential configuration conflicts. To maintain backward compatibility and reduce the burden on users to manually specify the Worker’s deployment directory, would the following approach work?

server-load-protection: max-disk-usage-percentage-thresholds: 0.8 # Continue monitoring the Worker's deployment directory (backward compatible) additional-disk-paths: # Optional: monitor additional directories //tmp/dolphinscheduler: 0.9 /var/log: 0.85 Benefits of This Approach Full backward compatibility: Existing configurations like max-disk-usage-percentage-thresholds: 0.8 will keep working as before, automatically applying to the Worker’s deployment directory. User-friendly: Users don’t need to know or configure the exact deployment path—the system handles it automatically. No frontend changes required: The UI can continue displaying disk usage for the Worker’s deployment directory without modification.Avoid overcomplicating the UI. Extensible: When needed, users can optionally define additional paths to monitor via additional-disk-paths.

What do you think of this proposal? @SbloodyS @ruanwenjun @zhongjiajie @Gallardot

dill21yu avatar Dec 02 '25 03:12 dill21yu

@dill21yu It’s preferable to retain the existing configuration key max-disk-usage-percentage-thresholds, but mark it as deprecated in the documentation.

Introduce a new configuration:

max-disk-usage-percentage-thresholds-rules:
  - disk-path: /dev1
    usage-percentage-thresholds: 0.9
  - disk-path: /dev2
    usage-percentage-thresholds: 0.8

When the old configuration max-disk-usage-percentage-thresholds is used, we should log a warning indicating that it is deprecated and recommend switching to the new max-disk-usage-percentage-thresholds-rules configuration.

ruanwenjun avatar Dec 10 '25 01:12 ruanwenjun