flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-33434][runtime-web] Support invoke async-profiler on TaskManager via REST API

Open yuchen-ecnu opened this issue 1 year ago • 2 comments

What is the purpose of the change

This is a subtask of FLIP-375, which introduces the async-profiler for profiling Jobmanager.

Brief change log

  • Generalized file upload from TaskManager to support different FileType uploading (different fileType could have different baseDir)
  • Introduce APIs for Creating Profiling Instances / Downloading Profiling Results / Retrieving Profiling List on TaskManager
  • Provide a web page for profiling TaskManager on Flink WEB

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented, it will be added in FLINK-33436

yuchen-ecnu avatar Jan 08 '24 05:01 yuchen-ecnu

Hi @Myasuka , do you have time to help review this PR?

yuchen-ecnu avatar Jan 08 '24 05:01 yuchen-ecnu

CI report:

  • 1b5ac2dfc0ae9c6186e0c7845075d9bc7f15d30c Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jan 08 '24 05:01 flinkbot

Hi @Myasuka , I have replaced Time with Duraion in the updated code. Please have a look if there are any other problems. Thanks.

yuchen-ecnu avatar Jan 15 '24 14:01 yuchen-ecnu

Hi @Myasuka , I have reverted the changes of the deprecated function requestTaskManagerFileUploadByName. But in the TaskManagerProfilingFileHandler, the Time timeout was used by the base class LeaderRetrievalHandler. And it was used by almost all handlers, so I think maybe we can replace that in a separate PR in the future.

yuchen-ecnu avatar Jan 16 '24 11:01 yuchen-ecnu

Hi @Myasuka , I have added two more tests for TaskManagerProfilingHandler and TaskManagerProfilingListHandler.

yuchen-ecnu avatar Jan 17 '24 14:01 yuchen-ecnu

@flinkbot run azure

yuchen-ecnu avatar Jan 18 '24 04:01 yuchen-ecnu