prefect icon indicating copy to clipboard operation
prefect copied to clipboard

Two Agents on different servers connected to same Work Queue producing double results

Open robfreedy opened this issue 2 years ago • 2 comments

First check

  • [X] I added a descriptive title to this issue.
  • [X] I used the GitHub search to find a similar issue and didn't find it.
  • [X] I searched the Prefect documentation for this issue.
  • [X] I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

We have gotten two Orion agents set up and running. They are both connected to the same work queue. Unfortunately, both agents pull the same jobs so everything is running on both servers and producing double results. This is not our understanding of the purpose of the work queue. For now we are going to limp along with 1 server but this is something we could really use help resolving.

Reproduction

Spinning up 2 agents on separate servers connecting to 1 work queue, then scheduling a flow and seeing that it. 

1. We are running the Prefect 2 agent in Docker on AWS Elastic Container Service (ECS).
     - The image is the prefecthq/prefect:2.3.1-python3.9 image from Docker Hub.
2. The agents are communicating with Prefect Cloud.
3. We are not customizing the agent profile except to provide the URL, API key, and work queue:
     - The URL is provided via the PREFECT_API_URL env var.
     - The API key is provided via the PREFECT_API_KEY env var.
     - The work queue is provided as an argument to the Docker entrypoint comment. We use: prefect agent start --work-queue default

Error

No errors, just multiple flow runs when there is only supposed to be 1.

Versions

2.3.1

Additional context

No response

robfreedy avatar Sep 09 '22 20:09 robfreedy

This issue may be resolved with this issue: https://github.com/PrefectHQ/prefect/issues/6725

robfreedy avatar Sep 12 '22 19:09 robfreedy

This is surprising. The API should absolutely not allow two processes to enter a running state. Two instances of the infrastructure may start, but the flow run should not run twice.

zanieb avatar Sep 12 '22 19:09 zanieb

I'm observing the same issue. In my case, there are 4 agents and one orion server running on the same machine, using the same conda environments. I observe frequently (but not every time) that there are duplicated subflow or flow runs.

knl avatar Sep 22 '22 13:09 knl

What version is your server? I believe we have resolved this with #6852

zanieb avatar Sep 22 '22 13:09 zanieb

Sorry that's unreleased! It'll be out today and should close this issue :)

zanieb avatar Sep 22 '22 13:09 zanieb

Sorry that's unreleased! It'll be out today and should close this issue :)

Awesome, will try it as soon as the new version is rolled out.

I'm running the latest, btw:

Version:             2.4.0
API version:         0.8.0
Python version:      3.7.9
Git commit:          513639e8
Built:               Tue, Sep 13, 2022 2:15 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         hosted

knl avatar Sep 22 '22 14:09 knl

I believe this was fixed in #6852 which has been released

jlowin avatar Sep 30 '22 12:09 jlowin