samza
samza copied to clipboard
SAMZA-2168: Remove redundant SystemAdmin creation in ApplicationMaster
Samza ApplicationMaster is a process responsible for scheduling, orchestrating and managing the lifecycle of containers of a samza job.
SystemAdmin abstraction is used in samza to validate, create and fetch the metadata of input and metadata streams of the samza job. Creating a SystemAdmin instance for a system is an expensive operation which entails creating the connection with the broker and setting up resources.
Currently, SystemAdmin for a system is created multiple times across the different components of the samza ApplicationMaster. This duplicate SystemAdmin creation happens as a part of the startup sequence of ApplicationMaster and unnecessarily increases the startup time of ApplicationMaster.
@xinyuiscool
Please take a look when you have a chance.
@xinyuiscool
- I tested this patch with the hello-samza test jobs in open-source.
- Verified that this patch works fine with samza-yarn and a beam job in LinkedIn.
For the large stateful jobs which consumes a lot of input topics from different systems, I verified that this patch reduces the startup time of ApplicationMaster from 1.5 minutes to 20 seconds.
Is this PR still needed?
This patch reduces the number of SystemAdmin instances created in ApplicationMaster startup control flow. This reduces the ApplicationMaster startup time.
Let me fix the merge conflicts and update the patch.
@shanthoosh FYI, is this PR still relevant?