incubator-uniffle
incubator-uniffle copied to clipboard
[Feature Request] Introduce the unique trace id to debug easily
Motivation
It's hard to analyze which process cost the most time of one remote request in current codebase, due to lacking corresponding trace id in client and server side.
Plan
Maybe we could introduce the unique trace id which is generated on time/client-machine-id in one remote request, and then record it in client's log. And when requesting remote server, we should populate this to server and make it recorded in server's log.
Will the feature influence the performance?
Maybe it will. But we could introduce new config to enable debug, default is false.
Do Flink have similar trace system? Spark don't have similar trace system. We ever implement this feature, but we don't choose to merge it. We just use that feature to get some information, the complexity which the feature brings is bigger than goodness. For offline system, metrics system may be better choice than trace system. Because although one time rpc is slow, the average still may be fast. I understand that we should add more Spark Metrics to show the rss's performance.
Got your point. Metric system is better for performance observation.
And i think i can invest this topic by submitting some PRs later.