incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Feature Request] Introduce the unique trace id to debug easily

Open zuston opened this issue 3 years ago • 4 comments

Motivation

It's hard to analyze which process cost the most time of one remote request in current codebase, due to lacking corresponding trace id in client and server side.

Plan

Maybe we could introduce the unique trace id which is generated on time/client-machine-id in one remote request, and then record it in client's log. And when requesting remote server, we should populate this to server and make it recorded in server's log.

zuston avatar Jul 22 '22 08:07 zuston

Will the feature influence the performance?

jerqi avatar Jul 25 '22 12:07 jerqi

Maybe it will. But we could introduce new config to enable debug, default is false.

zuston avatar Jul 25 '22 14:07 zuston

Do Flink have similar trace system? Spark don't have similar trace system. We ever implement this feature, but we don't choose to merge it. We just use that feature to get some information, the complexity which the feature brings is bigger than goodness. For offline system, metrics system may be better choice than trace system. Because although one time rpc is slow, the average still may be fast. I understand that we should add more Spark Metrics to show the rss's performance.

jerqi avatar Jul 25 '22 15:07 jerqi

Got your point. Metric system is better for performance observation.

And i think i can invest this topic by submitting some PRs later.

zuston avatar Jul 25 '22 15:07 zuston