Multiple pods per job
Is your feature request related to a problem? Please describe.
I was recently testing armada job services and had a question: the API currently rejects a JobSubmitRequestItem when more than one pod is specified with the rationale presented in the code here.
This seems to imply the need at present to run pods as separate jobs and coordinate service discovery across jobs, but adds a complication: services are named armada-<job-id>-<pod-index>-<service-type> but job id is a value that is not known at submission time. There are several ways to work around this to find the job id at runtime when pods are all running under separate job ids, but if I had the choice my preference would be to run multiple pod specs within the same job so that service discovery can be done in the way that the APIs seem to be suggesting that I do it.
The need to do out of band k8s lookups to find other "gang members" is not ideal because it 1) adds some level of load to the executor cluster k8s API where the gang is running, and 2) adds a small amount of complexity to gang application code to perform the out of band k8s lookup. All of this is to just find the job id to form a service name, and perform a DNS lookup once formed. If multiple pods could be submitted within the same JobSubmitRequestItem, the out of band k8s lookup wouldn't be necessary since each pod could use its own job id for service discovery, thus skipping directly to the DNS lookup. This way there wouldn't be any load or code complexity induced by out of band k8s lookup.
Describe the solution you'd like
The solution I would like is to submit multiple pod specs in a single JobSubmitRequestItem. Are there any roadmap plans for supporting multiple podSpecs per job? The comment in code I linked to above seems to express concern about ingress setup when multiple pods are part of the same job?
Describe alternatives you've considered
Another alternative I have considered is to write my own service controller that will watch pods and create services which use gang id instead of job id, since every member of the gang knows this value.
@jpoler thank you for this detailed feature request! This is not currently on the 2023 roadmap, but it is a very interesting idea. Do you have thoughts about what sort of API you'd like to see here?
FYI - more info on the above (per @samclark): we had basic gang scheduling via the use of the PodSpecs property. This never worked very well and was never used in production. Because it was a breaking API change the property was left but it's use was restricted to a single pod spec in the array
Thanks for the context @dave-gantenbein!
we had basic gang scheduling via the use of the PodSpecs property. This never worked very well and was never used in production.
Just to clarify, this statement is specifically about gang scheduling using PodSpecs plural? Just checking that gang scheduling using the gangId annotation across single pod jobs is thought to work well / is used in production?
Yes, we are using the gang scheduling feature as documented in production today.