dokploy icon indicating copy to clipboard operation
dokploy copied to clipboard

Enable Multiple Traefik Replicas Across Different Availability Zones for Enhanced High Availability

Open mcandylab opened this issue 1 year ago • 1 comments

What problem will this feature address?

Currently, there can only be a single Dokploy service running a single instance of Traefik. While adding additional workers in different availability zones allows containers to be replicated across the cluster, all traffic routing still depends on the single Traefik instance located on the primary server. If the primary server becomes unavailable, the entire setup loses external accessibility, rendering all containers in other zones inaccessible despite their healthy state. This creates a single point of failure.

Describe the solution you'd like

I would like the ability to run multiple Traefik replicas within a Dokploy-managed Swarm environment. Each replica could be placed in a different availability zone, ensuring that if one zone or server goes down, another Traefik instance can continue to route traffic to the containers still running in other zones. Ideally, Dokploy would seamlessly configure and maintain these Traefik replicas, balancing traffic and ensuring continuous availability without manual intervention.

Describe alternatives you've considered

  • External Load Balancers: Using an external load balancer or DNS-based failover could distribute traffic across multiple zones. However, this adds complexity and external dependencies.

  • Manual Configuration of Traefik Instances: Manually deploying multiple Traefik services in the Swarm and managing their configurations independently, which can become cumbersome and error-prone.

  • Single Availability Zone Deployment: Accepting the current limitation and relying on a single zone, which increases downtime risk and may not be suitable for large or mission-critical projects.

Additional context

By enabling multiple Traefik replicas, Dokploy could better serve large-scale or high-availability environments. This feature would help ensure uninterrupted access to services, improve resilience against zone-level failures, and simplify traffic management in multi-zone infrastructures. It would also align Dokploy with modern best practices for fault tolerance and high availability in distributed applications.

Will you send a PR to implement it?

No

mcandylab avatar Dec 17 '24 10:12 mcandylab

I came to submit an issue similar to this. We've been experiencing many issues with our Traefik container going down and needing to spin back up. It renders all of our services useless despite them being up and healthy consistently.

I'm not too knowledgable in this realm but I'd love to learn to contribute to this issue, or get people much smarter than I on it.

ghost avatar Dec 18 '24 18:12 ghost

It would be nice to add options so traefik can be replicated in some more nodes. And combined with keepalived it could be used as loadbalancer and as a VIP (Virtual IP) so we can have high-availability. At least that in my imagination works fine, but after testing it and understanding the easiest way to configure it, it could be awesome to share that configuration in the official documentation as part of applying high-availability in Dokploy.

leadvic avatar Feb 28 '25 12:02 leadvic

We have recently made a modification so that you can add any behaviour including this in the install command, as previously dokploy overwrote the environment variables, ports and volumes that traefik had so this was a limitation, now you have the freedom to add them yourself with the confidence that the values will be maintained after a restart of the dokploy container as it will not overwrite anything related to the dokploy-postgres, dokploy-redis and dokploy-traefik containers, so now you are free to use whatever you want, you have total freedom #1061

Installation script: https://github.com/Dokploy/website/blob/main/apps/website/public/canary.sh#L109-L155

Siumauricio avatar Mar 24 '25 06:03 Siumauricio

Theoretically, it should be possible to set dokploy up and have Traefik with a replica on each server. The only thing is the file system needs to be distributed for traefik among all of the nodes. I wonder if we could use glusterfs or ceph or SeaweedFS or even an S3 bucket to store that information.

medemi68 avatar Oct 31 '25 10:10 medemi68

Actually, doesn't Traefik support routing on docker swarm through labels?

medemi68 avatar Oct 31 '25 18:10 medemi68