pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Support maxNumRowsPerTask in RealtimeToOfflineSegmentsTask

Open swaminathanmanish opened this issue 1 year ago • 7 comments

Problem: Currently RealtimeToOfflineSegmentsTask that is used to move real time segments to offline segments, does not have the ability to tune maxNumRowsPerTask. This is the parameter that determines the input to a task. Without this configuration, we end up creating one minion task, which takes in all the input (i.e all segments that meet the criteria to be converted to offline segments) which prevents us from using other minions. There's no parallelism for this task.

swaminathanmanish avatar Apr 09 '24 16:04 swaminathanmanish

cc @aishikbh - Please take a look

swaminathanmanish avatar Apr 09 '24 16:04 swaminathanmanish

I'd like to take this up!

pratikpugalia avatar Jun 20 '24 07:06 pratikpugalia

@pratikpugalia - Would you be working on this now or taking it up later ? If not can I reassign to someone else. Thanks !

swaminathanmanish avatar Oct 08 '24 12:10 swaminathanmanish

@swaminathanmanish I would be taking this up by the end of the month if that is okay! sorry for the delay!

pratikpugalia avatar Oct 08 '24 23:10 pratikpugalia

@pratikpugalia - We have a slightly urgent ask to pick this up to address scaling issues in this task, thats happening now. Could I or someone from my team pick this up and you could pick something else ?

swaminathanmanish avatar Oct 20 '24 14:10 swaminathanmanish

Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!

pratikpugalia avatar Oct 20 '24 20:10 pratikpugalia

Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!

Thanks @pratikpugalia

Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!

Thanks @pratikpugalia

swaminathanmanish avatar Oct 21 '24 09:10 swaminathanmanish