dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Feature][Logging] Add support for logging into remote storage

Open EricGao888 opened this issue 3 years ago • 15 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

  • Add support for remote logging so that the tasks logs could be preserved in remote storage service like AWS S3 or Alibaba-Cloud OSS.

Use case

  • Currently, it seems dolphin scheduler does not support logging into remote storage such as AWS S3, Alibaba-Cloud OSS, etc.
  • If dolphin scheduler is deployed on a remote cluster like EMR and the cluster is torn down or there's something wrong with the workers, users will probably lose the logs of previous executed tasks.

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

EricGao888 avatar Feb 25 '22 09:02 EricGao888

Hi:

  • Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
  • In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
  • If you haven't received a reply for a long time, you can subscribe to the developer's email,Mail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to [email protected].

github-actions[bot] avatar Feb 25 '22 09:02 github-actions[bot]

Is it necessary to add thie feature in DS? Is it possible to use log monitoring synchronization to do it? Similar to Alibaba Cloud's SLS

caishunfeng avatar Feb 25 '22 10:02 caishunfeng

good suggestion, Can you give a design to support the remote logging system?

davidzollo avatar Feb 25 '22 11:02 davidzollo

@caishunfeng @dailidong Another question related to this issue is how does ds deal with workers' logs? If not logging into remote storage, how do we keep large amount of logs produced by workers? Is there a best practice for this?

EricGao888 avatar Feb 28 '22 03:02 EricGao888

@caishunfeng @dailidong Another question related to this issue is how does ds deal with workers' logs? If not logging into remote storage, how do we keep large amount of logs produced by workers? Is there a best practice for this?

Now DS doesn't specifically dealt with task log file, and the user handles it by himself, which is unfriendly. It is a goold idea to write the task log into remote storage when keep running large amout tasks. And I think the two point:

  1. How to write files? Specified by configuration or config in the process definition UI?
  2. How to read compatible local files and remote files?

If you have some good idea, please let me know.

caishunfeng avatar Feb 28 '22 07:02 caishunfeng

good suggestion, Can you give a design to support the remote logging system?

@dailidong I will try to give a design this week, thx : )

EricGao888 avatar Mar 07 '22 03:03 EricGao888

@caishunfeng @dailidong Another question related to this issue is how does ds deal with workers' logs? If not logging into remote storage, how do we keep large amount of logs produced by workers? Is there a best practice for this?

Now DS doesn't specifically dealt with task log file, and the user handles it by himself, which is unfriendly. It is a goold idea to write the task log into remote storage when keep running large amout tasks. And I think the two point:

  1. How to write files? Specified by configuration or config in the process definition UI?
  2. How to read compatible local files and remote files?

If you have some good idea, please let me know.

Will take these points into consideration. Thx for the suggestions!

EricGao888 avatar Mar 07 '22 03:03 EricGao888

Yes, logging into remote storage is a good idea when worker is deploy in cloud. We also need to consider whether the log target is global or whether different workflows can config they own log target. If dolphin scheduler needs to run on hybrid cloud in the future, I think we should configure its own log target for each workflow.

EricJoy2048 avatar Mar 15 '22 06:03 EricJoy2048

Could anyone please help add a discussion label to this? Thx~

EricGao888 avatar Jul 05 '22 09:07 EricGao888

Could anyone please help add a discussion label to this? Thx~

sure, I have added.

caishunfeng avatar Jul 12 '22 02:07 caishunfeng

Could anyone please help add a discussion label to this? Thx~

sure, I have added.

Thx~

EricGao888 avatar Jul 12 '22 02:07 EricGao888

Hi, @EricGao888 , I am interested in this issue.

rickchengx avatar Oct 13 '22 09:10 rickchengx

Hi, @EricGao888 , I am interested in this issue.

Sure, looking forward to your design : )

EricGao888 avatar Oct 13 '22 09:10 EricGao888

Hi, @EricGao888 , I am interested in this issue.

hstdream avatar Oct 13 '22 12:10 hstdream

my thoughts:

  1. Workflow completes aggregation log
  2. Worker completes log aggregation
  3. Local and remote check and store time by setting time interval through configuration

hstdream avatar Oct 13 '22 12:10 hstdream

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Nov 13 '24 00:11 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Nov 20 '24 00:11 github-actions[bot]