[FLINK-33681] Reuse input/output metrics of SourceOperator/SinkWriterOperator for task
What is the purpose of the change
Currently, the numRecordsIn & numBytesIn metrics for sources and the numRecordsOut & numBytesOut metrics for sinks are always 0 on the Flink web dashboard.
FLINK-11576 brings us these metrics on the opeartor level, but it does not integrate them on the task level. On the other hand, the summay metrics on the job overview page is based on the task level I/O metrics. As a result, even though new connectors supporting FLIP-33 metrics will report operator-level I/O metrics, we still cannot see the metrics on dashboard.
This MR attempts to reuse the operator-level source/sink I/O metrics for task so that they can be viewed on Flink web dashboard.
Brief change log
Reuse input/output metrics of SourceOperator/SinkWriterOperator for task.
Since Flink only accounts for internal traffic for input/output bytes metrics before, the reuse won't cause duplication in the I/O bytes metrics. Also, as the output records metric is intentionally dropped for SinkWriterOperator in OperatorChain#getOperatorRecordsOutCounter and no input records metric is collected for SourceOperatorStreamTask, no duplication in the I/O records metrics will take place.
Verifying this change
Manually run Kafka2Kafka job on a testing cluster on K8s, verified that the source/sink input/output metrics can be seen on the web dashboard.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (yes / no) - The serializers: (yes / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
- The S3 file system connector: (yes / no / don't know)
Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
CI report:
- 662dea4e946cb0f2e287dcd7048897b93d9a46aa Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@affo Thanks! @huwh Could you help take a look at the PR?
Hi @gyfora, do you think it a good way to go? Many Flink beginners have been complaining about this here and there for years. Given that we've already have the right stats in operator-level metrics, it'll be good to expose it on UI directly.
@becketqin Thanks for reviewing. I'll add UTs to cover the change and I've already raised FLIP-460 for it, but no discussion is received yet. Could you help review the FLIP? cc @affo @roncohen
@becketqin Hi, I've added test cases and rebased it onto the latest master. Could you take a look again?