prefect
prefect copied to clipboard
Two Agents on different servers connected to same Work Queue producing double results
First check
- [X] I added a descriptive title to this issue.
- [X] I used the GitHub search to find a similar issue and didn't find it.
- [X] I searched the Prefect documentation for this issue.
- [X] I checked that this issue is related to Prefect and not one of its dependencies.
Bug summary
We have gotten two Orion agents set up and running. They are both connected to the same work queue. Unfortunately, both agents pull the same jobs so everything is running on both servers and producing double results. This is not our understanding of the purpose of the work queue. For now we are going to limp along with 1 server but this is something we could really use help resolving.
Reproduction
Spinning up 2 agents on separate servers connecting to 1 work queue, then scheduling a flow and seeing that it.
1. We are running the Prefect 2 agent in Docker on AWS Elastic Container Service (ECS).
- The image is the prefecthq/prefect:2.3.1-python3.9 image from Docker Hub.
2. The agents are communicating with Prefect Cloud.
3. We are not customizing the agent profile except to provide the URL, API key, and work queue:
- The URL is provided via the PREFECT_API_URL env var.
- The API key is provided via the PREFECT_API_KEY env var.
- The work queue is provided as an argument to the Docker entrypoint comment. We use: prefect agent start --work-queue default
Error
No errors, just multiple flow runs when there is only supposed to be 1.
Versions
2.3.1
Additional context
No response
This issue may be resolved with this issue: https://github.com/PrefectHQ/prefect/issues/6725
This is surprising. The API should absolutely not allow two processes to enter a running state. Two instances of the infrastructure may start, but the flow run should not run twice.
I'm observing the same issue. In my case, there are 4 agents and one orion server running on the same machine, using the same conda environments. I observe frequently (but not every time) that there are duplicated subflow or flow runs.
What version is your server? I believe we have resolved this with #6852
Sorry that's unreleased! It'll be out today and should close this issue :)
Sorry that's unreleased! It'll be out today and should close this issue :)
Awesome, will try it as soon as the new version is rolled out.
I'm running the latest, btw:
Version: 2.4.0
API version: 0.8.0
Python version: 3.7.9
Git commit: 513639e8
Built: Tue, Sep 13, 2022 2:15 PM
OS/Arch: linux/x86_64
Profile: default
Server type: hosted
I believe this was fixed in #6852 which has been released