spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] support time-range nsys like profiling for specified stages

Open yuanqingz opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. For now, spark-rapids only support to profile a whole stage based on the specified stage id, refer to the setting spark.rapids.profile.stages described here. While for production queries with super large input data size, a single single stage could last for tens minutes or even hours, it could be quite time-consuming to generate the result file and the executor scheduling behavior could lead to an early exit without successfully flushing the result file. Describe the solution you'd like A clear and concise description of what you want to happen. A config with time-range setting to specified stage_ids, for example: spark.rapids.profile.stages=10:0-30,12:10-70 for first 30 seconds of stage 10, delayed 10 seconds then following 60 seconds for stage 12 Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context, code examples, or references to existing implementations about the feature request here.

yuanqingz avatar Jun 19 '24 03:06 yuanqingz

@yuanqingz Is this for LORE or something else? It is not clear from the title or description.

revans2 avatar Jun 24 '24 15:06 revans2

@revans2 It's for the built-in nsys profiler function inside spark-rapids, described in configs start with spark.rapids.profile.*, linked here. I don't think it's LORE.

yuanqingz avatar Jun 25 '24 01:06 yuanqingz

Thanks @yuanqingz it wasn't clear from the description.

I thought you could just combine the time based profiling config with the stages based profiling config https://github.com/NVIDIA/spark-rapids/blob/7bac3a6439c10efb1961d3c4ba028128d9dca249/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala#L740-L745 to get what you want.

@jlowe is that correct?

revans2 avatar Jun 25 '24 13:06 revans2

Currently the time range and job/stage range profiling configs are mutually exclusive. Even if they were allowed to be combined, it wouldn't quite do what is being requested here. IIUC the desire is to time-limit the profiling once it is triggered by job/stage level profiling, which would be different than time ranges + job/stage ranges. The latter could trigger just based on time which I don't think is desired here. It also has questionable interactions, e.g.: what if a job/stage range finishes while time range is still active, i.e.: is the combination of time + job/stage configs treated like a union or an intersection of these profile ranges? It's because of this ambiguity I decided to not deal with it for the short-term until we could decide what we want.

For this feature, I think we would need either a new syntax for job/stage ranges as proposed here or a new, separate config that has as many range entries as the job/stage config does to list the corresponding time limits. The latter is more complex to reason about and keep in sync, so I think extending the syntax of the existing job/stage config makes more sense.

jlowe avatar Jun 25 '24 14:06 jlowe