aws-rfdk icon indicating copy to clipboard operation
aws-rfdk copied to clipboard

RenderQueue should set Stickiness for the ALB

Open RandomInsano opened this issue 1 year ago • 0 comments

Currently the collection of RCSs started by the render queue ASG don't allow the ALB clients to be routed to specific Docker containers. This means that when Deadline makes mutli-query requests (say to get the list of jobs) clients may jump between backing RCS instances effectively causing the requests to be re-started from the beginning which hurts performance.

This was reported by a customer and so myself and one of my team mates investigated the RenderQueue construct and it seems that we're not setting any Stickiness settings in the CfnTargetGroup instantiated here.

I believe this could work, but it would help if someone with more experience fact check me before I go testing this since I have yet to start an RFDK deployment just yet.

Reproduction Steps

I believe starting any render queue with more than two instances should start generating errors in the RCS logs.

Error Log

“The transaction ID and the batch ID of the request is abcdef12-3456-7890-9b3b-22d5af21d25a and 2 respectively, but the server don't have this transaction ID in the cache”

Environment

  • CDK CLI Version : I presume all?
  • CDK Framework Version: Also all
  • RFDK Version: Also all
  • Deadline Version: All
  • Node.js Version: All
  • OS : Not applicable
  • Language (Version): Typescript

Other

My plan is to add this:

targetGroupResource.targetGroupAttributes = [
    {
        "Key": "stickiness.enabled",
        "Value": "true"
    },
    {
        "Key": "stickiness.type",
        "Value": "lb_cookie"
    },
    {
        "Key": "stickiness.lb_cookie.duration_seconds",
        "Value": "86500" // One day
    }
];

This is :bug: Bug Report

RandomInsano avatar Nov 23 '22 23:11 RandomInsano