shinyproxy icon indicating copy to clipboard operation
shinyproxy copied to clipboard

ECS Backend: Infinite retry loop when container fails to start with `minimum-seats-available: 1`

Open stefanlinner opened this issue 6 months ago • 2 comments

Description

Thank you for the recent release and the improved logging behavior for the ECS backend!

I've encountered an issue when using the ShinyProxy ECS backend with container pre-initialization. Note that this issue was not introduced with the recent ShinyProxy 3.2.0 release - I had run into this behavior before but couldn't pinpoint the exact root cause at that time. When minimum-seats-available is set to 1 and the allocated resources (container-cpu-request/container-memory-request) are insufficient for the application to start, the ECS cluster enters an infinite retry loop attempting to spin up the failing container.

Problem Details

Expected Behavior: Failed containers should eventually stop retrying or have a reasonable backoff/failure threshold.

Actual Behavior: The ECS cluster continuously attempts to start the failing task indefinitely, even after:

  • Correcting the resource allocation in the configuration
  • Completely removing the problematic spec entry
  • Updating the ShinyProxy ECS service with "Force new deployment"

Workaround: The only solution I found was to completely destroy and recreate the infrastructure using tofu destroy and tofu apply.

Question: Is there another way to resolve this infinite retry loop without having to destroy and recreate the entire setup?

Reproduction Steps

The following minimal example should reproduce this issue:

application.yml:

proxy:
  containerBackend: ecs
  ecs:
    name: ${CLUSTER_NAME}
    region: ${AWS_REGION}
    subnets:
      - ${SUBNET_0}
      - ${SUBNET_1}
    security-groups: ${SECURITY_GROUP}
    enable-cloud-watch: true
  specs:
    - id: dummy_app
      display-name: Test App
      description: Test App
      container-cmd: ["R", "-e", "shiny::runApp()"]
      container-image: dummy_app_image
      ecs-execution-role: arn:aws:iam::app-execution-role
      container-cpu-request: 512  # Insufficient for the app requirements
      container-memory-request: 4096
      minimum-seats-available: 1  # This triggers the infinite retry behavior

app.R (Shiny application that requires >1 CPU):

library(shiny)
library(future)

# This will fail if insufficient CPU cores are available
plan(multisession, workers = 2)

ui <- fluidPage(
  titlePanel("Simple Test App"),
  h3("App started successfully!")
)

server <- function(input, output, session) {
  # Empty server - app just needs to start
}

shinyApp(ui = ui, server = server)

Root Cause

The dummy application attempts to initialize future::multisession() with 2 workers, which requires more than 1 CPU core. With only 512 CPU units allocated (0.5 cores), the container fails to start, triggering the infinite retry behavior.

Additional Question

While I have the opportunity, I'd like to ask about another behavior I've observed (not directly related to this issue):

When updating the ShinyProxy configuration and updating the ShinyProxy service with "Force new deployment", the configuration updates work correctly. However, the previous pre-initialized instances continue running and need to be stopped manually. Is there a way for tasks related to the previous ShinyProxy service to be stopped automatically during deployment updates? And more general, are there plans for a smoother config update experience with ECS something comparable to the shinyproxy operator?

stefanlinner avatar Jul 09 '25 20:07 stefanlinner

Hi

W.r.t to the first issue: it's indeed a known issue that ShinyProxy keeps re-trying to create the container. This is something we want to improve in the next release.

The only solution I found was to completely destroy and recreate the infrastructure using tofu destroy and tofu apply.

I think it should be possible to only remove the containers and not necessarily the complete cluster and network.

Regarding the configuration update: if you deploy Redis (or AWS ElastiCache), ShinyProxy stores the active apps and sessions in Redis. Therefore, if you restart ShinyProxy, it is able to remember which containers were created. If the configuration of an app hasn't changed, it can re-use the existing containers (it also remembers which uses were assigned to the containers). If the configuration has changed, ShinyProxy will detect this and replace the existing containers (as soon as these aren't assigned to a user). Therefore, you don't have to manually clean up old containers.

You can read more about this here: https://shinyproxy.io/documentation/configuration/#session-and-app-persistence

And more general, are there plans for a smoother config update experience with ECS something comparable to the shinyproxy operator?

We would love to add support for ECS to the operator, it would be a great improvement! The latest release of the operator added support for pure Docker hosts, therefore the architecture of the code is ready to add additional backends. Although we see (through GitHub and customers) that ECS is used quite a bit, we didn't yet have any request for adding operator support. If there is more interested (or the feature is supported by a customer), we will definitely add support for it.

LEDfan avatar Jul 14 '25 11:07 LEDfan

Thanks for the answers! It would be great if support for ECS was added to the operator. I tried using the operator with the Docker backend, and it worked really well!

stefanlinner avatar Jul 14 '25 14:07 stefanlinner