ECS Backend: Infinite retry loop when container fails to start with `minimum-seats-available: 1`
Description
Thank you for the recent release and the improved logging behavior for the ECS backend!
I've encountered an issue when using the ShinyProxy ECS backend with container pre-initialization. Note that this issue was not introduced with the recent ShinyProxy 3.2.0 release - I had run into this behavior before but couldn't pinpoint the exact root cause at that time. When minimum-seats-available is set to 1 and the allocated resources (container-cpu-request/container-memory-request) are insufficient for the application to start, the ECS cluster enters an infinite retry loop attempting to spin up the failing container.
Problem Details
Expected Behavior: Failed containers should eventually stop retrying or have a reasonable backoff/failure threshold.
Actual Behavior: The ECS cluster continuously attempts to start the failing task indefinitely, even after:
- Correcting the resource allocation in the configuration
- Completely removing the problematic spec entry
- Updating the ShinyProxy ECS service with "Force new deployment"
Workaround: The only solution I found was to completely destroy and recreate the infrastructure using tofu destroy and tofu apply.
Question: Is there another way to resolve this infinite retry loop without having to destroy and recreate the entire setup?
Reproduction Steps
The following minimal example should reproduce this issue:
application.yml:
proxy:
containerBackend: ecs
ecs:
name: ${CLUSTER_NAME}
region: ${AWS_REGION}
subnets:
- ${SUBNET_0}
- ${SUBNET_1}
security-groups: ${SECURITY_GROUP}
enable-cloud-watch: true
specs:
- id: dummy_app
display-name: Test App
description: Test App
container-cmd: ["R", "-e", "shiny::runApp()"]
container-image: dummy_app_image
ecs-execution-role: arn:aws:iam::app-execution-role
container-cpu-request: 512 # Insufficient for the app requirements
container-memory-request: 4096
minimum-seats-available: 1 # This triggers the infinite retry behavior
app.R (Shiny application that requires >1 CPU):
library(shiny)
library(future)
# This will fail if insufficient CPU cores are available
plan(multisession, workers = 2)
ui <- fluidPage(
titlePanel("Simple Test App"),
h3("App started successfully!")
)
server <- function(input, output, session) {
# Empty server - app just needs to start
}
shinyApp(ui = ui, server = server)
Root Cause
The dummy application attempts to initialize future::multisession() with 2 workers, which requires more than 1 CPU core. With only 512 CPU units allocated (0.5 cores), the container fails to start, triggering the infinite retry behavior.
Additional Question
While I have the opportunity, I'd like to ask about another behavior I've observed (not directly related to this issue):
When updating the ShinyProxy configuration and updating the ShinyProxy service with "Force new deployment", the configuration updates work correctly. However, the previous pre-initialized instances continue running and need to be stopped manually. Is there a way for tasks related to the previous ShinyProxy service to be stopped automatically during deployment updates? And more general, are there plans for a smoother config update experience with ECS something comparable to the shinyproxy operator?
Hi
W.r.t to the first issue: it's indeed a known issue that ShinyProxy keeps re-trying to create the container. This is something we want to improve in the next release.
The only solution I found was to completely destroy and recreate the infrastructure using tofu destroy and tofu apply.
I think it should be possible to only remove the containers and not necessarily the complete cluster and network.
Regarding the configuration update: if you deploy Redis (or AWS ElastiCache), ShinyProxy stores the active apps and sessions in Redis. Therefore, if you restart ShinyProxy, it is able to remember which containers were created. If the configuration of an app hasn't changed, it can re-use the existing containers (it also remembers which uses were assigned to the containers). If the configuration has changed, ShinyProxy will detect this and replace the existing containers (as soon as these aren't assigned to a user). Therefore, you don't have to manually clean up old containers.
You can read more about this here: https://shinyproxy.io/documentation/configuration/#session-and-app-persistence
And more general, are there plans for a smoother config update experience with ECS something comparable to the shinyproxy operator?
We would love to add support for ECS to the operator, it would be a great improvement! The latest release of the operator added support for pure Docker hosts, therefore the architecture of the code is ready to add additional backends. Although we see (through GitHub and customers) that ECS is used quite a bit, we didn't yet have any request for adding operator support. If there is more interested (or the feature is supported by a customer), we will definitely add support for it.
Thanks for the answers! It would be great if support for ECS was added to the operator. I tried using the operator with the Docker backend, and it worked really well!