Support maxNumRowsPerTask in RealtimeToOfflineSegmentsTask
Problem: Currently RealtimeToOfflineSegmentsTask that is used to move real time segments to offline segments, does not have the ability to tune maxNumRowsPerTask. This is the parameter that determines the input to a task. Without this configuration, we end up creating one minion task, which takes in all the input (i.e all segments that meet the criteria to be converted to offline segments) which prevents us from using other minions. There's no parallelism for this task.
cc @aishikbh - Please take a look
I'd like to take this up!
@pratikpugalia - Would you be working on this now or taking it up later ? If not can I reassign to someone else. Thanks !
@swaminathanmanish I would be taking this up by the end of the month if that is okay! sorry for the delay!
@pratikpugalia - We have a slightly urgent ask to pick this up to address scaling issues in this task, thats happening now. Could I or someone from my team pick this up and you could pick something else ?
Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!
Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!
Thanks @pratikpugalia
Hey @swaminathanmanish No worries, I'll look at other issues I could possibly pickup!
Thanks @pratikpugalia