batch: Cannot omit the start/end of target node on batch.MultiNodeContainer
Describe the feature
When using CDK to deploy aws batch multi-node jobs, we have to specify both the start_node and end_node, FYI: https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_batch/MultiNodeContainer.html#aws_cdk.aws_batch.MultiNodeContainer. But refer to the https://docs.aws.amazon.com/batch/latest/APIReference/API_NodeRangeProperty.html#API_NodeRangeProperty_Contents, it is possible to omit start_node/end_node here.
Use Case
Our use case is as following: after I deploy the aws batch cloud infrastructure, I can use boto3 with python to submit a job like:
import boto3
response = batch_client.submit_job(
jobName=job_name,
jobQueue=job_queue,
jobDefinition=multi_job_definition,
parameters=job_parameters,
nodeOverrides={
"numNodes": overridden_num_nodes,
},
)
Currently It will throw error like:
botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the SubmitJob operation: NumNodes override can only be applied if the job definition has at least 1 target node without a range_end i.e (:) or (range_start:).
This is because in CDK we have to specify the start_node and end_node. But if we support to omit the end_node in CDK, we can avoid this problem. And this is valid according to the aws batch multi-node job definition. Currently the only work around is to create another job_definition based on what we deploy and modify the target node in the batch console.
Proposed Solution
No response
Other Information
No response
Acknowledgements
- [ ] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
CDK version used
2.131.0
Environment details (OS name and version, etc.)
Amazon Linux 2
https://docs.aws.amazon.com/batch/latest/APIReference/API_NodeRangeProperty.html#API_NodeRangeProperty_Contents, it is possible to omit start_node/end_node here.
Yes looks like the ending node index can be omitted. I guess we need a PR to get it fixed.
Hi Team, any updates on the issue? I see the PR is there for several weeks.
Hey @jalencato , we are currently looking into how to fix this issue without introducing any breaking change for the others.
Hi @shikha372 is there a suggested workaround currently? I guess using a Cfn construct might work?