mu icon indicating copy to clipboard operation
mu copied to clipboard

AWS Batch support via mu?

Open AndreyMarchuk opened this issue 6 years ago • 5 comments

  1. What do you think about adding AWS Batch support into mu?
  2. What do you think about Batch compute env being wrapped into mu env?

Example of Batch compute environment, Batch queue and Batch job definitions:

environments:
  - name: dev
    ...
    # AWS batch compute env
    batchComputeEnv:
      - name: myBatchEnv
        type: managed | unmanaged
        serviceRole: # ECS IAM role
        instanceRole: 
        keyName:  # ec2 key pair name
        # compute resources
        provisionModel: onDemand | spot
        allowedInstances: optimal | <c4.large> ... # Optimal chooses the best fit of M4, C4, and R4 instance types available in the region.
        minimumVCPUs:
        desiredVCPUs:
        maximumVCPUs:
        imageId: # AMI id
        # networking
        # allow to use ENV vpcTarget settings (by avoiding vpcId and instanceSubnetIds here)
        vpcId: 
        instanceSubnetIds: 
        securityGroups:
    # Batch queues
    # Job queues with a higher integer value for priority are given preference for compute resources.
    # Jobs are submitted to the connected compute environments based on the order they are listed and the available capacity of those environments.
    batchQueues:
      - name: priority1
        priority: 1 
        computeEnvs:
          - myBatchEnv
      - name: priority5
        priority: 5
        computeEnvs:
          - myBatchEnv
      - name: priority10
        priority: 10
        computeEnvs:
          - myBatchEnv


# AWS batch job definitions    
batchJobs:
  - name: myBatchJob1
    jobRole: # ECS IAM role
    containerImage: amazonlinux
    command: # (optional)
    vCPUs: 2
    memory: 100
    attempts: 1
    execTimeout: 100  # Time (in seconds) to allow each job attempt to run. If your job runs longer than the specified time, it will stop and be moved to FAILED.
    uLimits:
      - name: CORE
        soft: 10
        hard: 80
    parameters:
      param1: value1
    environment:
      envvar1: val1
    # security
    priviledged: true
    user: nobody
    # volumes
    volumes: 
      volname: sourcepath
    readOnlyFilesystem: false
    mountPoints:
      - containerPath: '/srv/www'
        sourcePath: '/opt/build'
        readOnly: false

AndreyMarchuk avatar Oct 14 '18 17:10 AndreyMarchuk

Created following POC:

  1. batch Compute Environments and Job Queues handled via mu extension
  2. batch pipeline and job definition handled via new 'batch' entity in mu.yml
  3. job definition deploy is done via mu batch deploy

Here is the code for items 2 and 3: https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:feature/batch-job?expand=1

Now the question is: would it be better handled as Service.Provider = batch?

AndreyMarchuk avatar Oct 24 '18 23:10 AndreyMarchuk

I think having aService.Provider = batch would be the most ideal as it avoids a lot of duplication. how plausible would it be to implement?

cplee avatar Oct 25 '18 05:10 cplee

POC for Service.Provider = batch

  • currently usable as is (batch env provisioned via extension)
  • batch deployments (job definition registration) do not depend on existing environment
  • could be a starting point for deployments without LB
  • definitely less changes and duplication compared to independent batch entity implementation
  • todo: support all batch job definition params via mu.yml
  • todo: ability to provision batch environment using Environment.Provider = batch
  • todo: tests

https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:batch-as-service-provider?expand=1

AndreyMarchuk avatar Oct 26 '18 01:10 AndreyMarchuk

looking good! curious, what is this for?

ProviderOverride     string                 `yaml:"provider,omitempty"`

https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:batch-as-service-provider?expand=1#diff-5db7db86b937470e53496a6ce29a1d3dR160

cplee avatar Oct 26 '18 15:10 cplee

Currently mu tries to fetch the Env Stack to get the Provider from the environment. ProviderOverride allows to specify Provider on Service level so that Env Stack does not have to exist. Batch job definition registration does not depend on environment.

service:
  name: my-batch-job
  
  # deployed as AWS Batch job
  provider: batch

It also forces service to be treated as batch even if user mistakenly deploys the service onto non-batch environment (i.e. ecs, ec2 etc)

AndreyMarchuk avatar Oct 26 '18 16:10 AndreyMarchuk