serverless-step-functions icon indicating copy to clipboard operation
serverless-step-functions copied to clipboard

AWS Batch ContainerOverrides - Command from input JSON not propagated into the job

Open konradsemsch opened this issue 5 years ago • 0 comments

This is a Bug Report

Description

I'm building a pipeline using the serverless-step-functions plugin where I'm running a container stored in ECR using AWS Batch. I'm using a single container across different steps and would like to use the ContainerOverrides property to override the Command that runs in the container in each step. However, the problem is that I'm unable to propagate the command from the input json into the job. It either renders it as a string or fails with another error (all scenarios described below).

Scenario 1

Implementation based on: link

Input json (same for both scenarios):

{
    "model_version": "0.0.1",
    "execution_type": "use_generated_data",
    "cmd_generate_data": "generate cli_config.yml --full-dataset",
    "cmd_train_model": "train cli_config.yml --use-generated-signups",
    "cmd_document": "docs"
}

Exemplary step:

TrainModel:
            Type: Task
            Resource: arn:aws:states:::batch:submitJob.sync
            Parameters:
              JobName: ${self:custom.config.batch.job_name}-train-model-job
              JobDefinition: "#{JobDefinition}"
              JobQueue: ${self:custom.config.batch.job_queue}
              ContainerOverrides:
                Command.$: "$.cmd_train_model"
                Environment:
                  - Name: "MODEL_VERSION"
                    Value.$: "$.model_version"
            ResultPath: null
            Next: GenerateDocumentation

Result: image

{
  "error": "States.Runtime",
  "cause": "An error occurred while executing the state 'TrainModel' (entered at the event id #4). The Parameters '{\"JobName\":\"train-model-job\",\"JobDefinition\":\"arn:aws:batch:eu-central-1:xxx:job-definition/train-model-BatchJob:12\",\"JobQueue\":\"xxx\",\"ContainerOverrides\":{\"Command\":\"train cli_config.yml --use-generated-signups\"}}' could not be used to start the Task: [Cannot deserialize instance of `java.util.ArrayList` out of VALUE_STRING token]"
}

Conclusion: it's able to render the command, however, it's complaining about the yml format? I'm not sure in what other form it should be provided though. Funny enough, "$.model_version" get's updated correctly. I also have a conditional step that is evaluated based on "execution_type": "use_generated_data", which also runs without problems:

          ChooseWorkflow:
            Type: Choice
            Choices:
            - Variable: "$.execution_type"
              StringEquals: "use_generated_data"
              Next: TrainModel
            - Variable: "$.execution_type"
              StringEquals: "generate_new_data"
              Next: GenerateData
            Default: DefaultState

Scenario 2

Exemplary step:

TrainModel:
            Type: Task
            Resource: arn:aws:states:::batch:submitJob.sync
            Parameters:
              JobName: ${self:custom.config.batch.job_name}-train-model-job
              JobDefinition: "#{JobDefinition}"
              JobQueue: ${self:custom.config.batch.job_queue}
              ContainerOverrides:
                Command: 
                  - "$.cmd_train_model"
                Environment:
                  - Name: "MODEL_VERSION"
                    Value.$: "$.model_version"
            ResultPath: null
            Next: GenerateDocumentation

Result: the command is rendered as a string instead of being evaluated.

image

Scenario 3

Exemplary step:

TrainModel:
            Type: Task
            Resource: arn:aws:states:::batch:submitJob.sync
            Parameters:
              JobName: ${self:custom.config.batch.job_name}-train-model-job
              JobDefinition: "#{JobDefinition}"
              JobQueue: ${self:custom.config.batch.job_queue}
              ContainerOverrides:
                Command: "$.cmd_train_model"
                Environment:
                  - Name: "MODEL_VERSION"
                    Value.$: "$.model_version"
            ResultPath: null
            Next: GenerateDocumentation

Result: fails when sls is trying to deploy

An error occurred: XXX - Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: Cannot deserialize instance of `java.util.ArrayList` out of VALUE_STRING token at /States/TrainModel/Parameters' (Service: AWSStepFunctions; Status Code: 400; Error Code: InvalidDefinition; Request ID: XXX; Proxy: null).

Question: could you please advise how to go about that and how to render the command from the input properly?

Additional Data

  • Serverless Framework Core Version you're using: Framework Core: 1.78.1 Plugin: 3.7.0 SDK: 2.3.1 Components: 2.34.3

  • The Plugin Version you're using: [email protected]

  • Operating System: macOS Cataliina 10.15.3 (19D76)

konradsemsch avatar Sep 02 '20 12:09 konradsemsch