prefect icon indicating copy to clipboard operation
prefect copied to clipboard

'No module named ...' when referencing external files from flow with DaskTaskRunner

Open cvetelinandreevdreamix opened this issue 10 months ago • 13 comments

Bug summary

Hi, all,

I'm opening a new issue as it seems my comments on the closed one (#15783 ) are ignored.

I'm attaching a project you can use to reproduce the error test-prefect-migration copy.zip

2024-12-06 12:06:12 ModuleNotFoundError: No module named 'common'

It happens when: I deploy the flow to Minio. Use python deployment.py I use a function from another module in the task I want to run with dask If I run the flow from the local file system (python deployment.py) then it works fine If I deploy and flow and I don't use a function imported from another module, then it works fine

My case is that I have a shared code between multiple flows and it is extracted in a shared module (common).

I can see the common folder is deployed to Minio.

Let me know If you need more information. I need to migrate to prefect 3 and use the automations.

Thank you for your time!

Version info

Version:             2.20.9
API version:         0.8.4

Python version:      3.10.10
Git commit:          b101915a
Built:               Tue, Oct 1, 2024 12:41 PM
OS/Arch:             darwin/arm64
Profile:             pd-flow-local
Server type:         server

###### Requirements with Version Specifiers ######
prefect~=2.20.9
prefect_dask~=0.2.10
prefect-docker==0.4.0

Additional context

No response

cvetelinandreevdreamix avatar Jan 30 '25 06:01 cvetelinandreevdreamix

Can I pick it ?

tsafacjo avatar Feb 01 '25 23:02 tsafacjo

Hi @tsafacjo if you have an idea for how to solve the issue, then yes please feel free to give it a go!

cicdw avatar Feb 06 '25 17:02 cicdw

Hi there,

It seems a hard one, right?

cvetelinandreevdreamix avatar Mar 18 '25 06:03 cvetelinandreevdreamix

@cvetelinandreevdreamix point of clarification: are you saying that the bug only happens when you use the RemoteFileSystem class

@flow(task_runner=DaskTaskRunner(), result_storage=RemoteFileSystem.load('minio-results'))

but the bug doesn't occur when you use the block slug?

cicdw avatar Mar 18 '25 19:03 cicdw

@cicdw This is what I wrote initially. Thinking of it, why would this stop me migrate ...

I'll have to check again the example project I provided above. I'll get back in few days.

cvetelinandreevdreamix avatar Mar 20 '25 10:03 cvetelinandreevdreamix

Reading more carefully my initial comment it says that the deployment doesn't work if I don't use the slug. The main issue is the error ModuleNotFoundError: No module named 'common' when I use dependency from another file in the flow folder.

cvetelinandreevdreamix avatar Mar 20 '25 10:03 cvetelinandreevdreamix

I see, I haven't been able to reproduce yet but one thing that does help prevent errors when using the .load() method on Blocks is the _sync kwarg:

@flow(task_runner=DaskTaskRunner(), result_storage=RemoteFileSystem.load('minio-results', _sync=True))

More info on what this is doing can be found here

cicdw avatar Mar 21 '25 01:03 cicdw

I'll try on my end here and let you know.

Thank you!

cvetelinandreevdreamix avatar Mar 21 '25 06:03 cvetelinandreevdreamix

Hi @cicdw . I confirm the issue is still there after upgrading the sample project above to prefecthq/prefect:3.2.13-python3.10.

Let me know If you need assistance from my side.

What I did was to

  1. download the file from the OP. Unzip it and open it with my ide.
  2. update the prefect version in all the Dockerfile and the requirements.txt.
  3. run pip3 install -r requirements.txt
  4. docker compose up
  5. cd flows && python test.py (this one works)
  6. python deployment.py && prefect deployment run 'test-flow/test_flow' (This one errors with No module named: common)

I confirm all the flow code, including the common file is in the prefect bucket.

cvetelinandreevdreamix avatar Mar 21 '25 10:03 cvetelinandreevdreamix

Could you share the working directory that is being used when the deployment is run? Most likely the module you wrote is not on the path of the execution environment in Minio.

cicdw avatar Mar 21 '25 18:03 cicdw

How do I do that? The flow gets stuck before executing the @flow function.

This is the log if the default worker

2025-03-21 20:54:15 18:54:15.030 | DEBUG   | prefect.profiles - Using profile 'ephemeral'
2025-03-21 20:54:15 18:54:15.821 | DEBUG   | prefect.runner - Starting runner...
2025-03-21 20:54:15 18:54:15.842 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:15 18:54:15.843 | DEBUG   | prefect.runner - Limit slot acquired for flow run 'f5d7f4c3-e553-4970-9702-e0d386196c35'
2025-03-21 20:54:15 18:54:15.863 | INFO    | prefect.flow_runs.runner - Opening process...
2025-03-21 20:54:15 18:54:15.871 | DEBUG   | prefect.utilities.services.critical_service_loop - Starting run of functools.partial(<bound method Runner._check_for_cancelled_flow_runs of Runner(name='runner-c418f67c-7f67-4f95-b0fd-41ecac2f7c4b')>, should_stop=<function Runner.execute_flow_run.<locals>.<lambda> at 0xffffa9fd9750>, on_stop=<bound method CancelScope.cancel of <anyio._backends._asyncio.CancelScope object at 0xffffa9fa1780>>)
2025-03-21 20:54:15 18:54:15.872 | DEBUG   | prefect.runner - Checking for cancelled flow runs...
2025-03-21 20:54:15 18:54:15.882 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:16 18:54:16.902 | DEBUG   | prefect.profiles - Using profile 'ephemeral'
2025-03-21 20:54:16 18:54:16.992 | DEBUG   | Flow run 'truthful-smilodon' - Running 1 deployment pull step(s)
2025-03-21 20:54:17 18:54:17.003 | INFO    | Flow run 'truthful-smilodon' -  > Running pull_with_block step...
2025-03-21 20:54:17 18:54:17.022 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:17 18:54:17.029 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:17 18:54:17.050 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:17 18:54:17.072 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:17 18:54:17.634 | DEBUG   | prefect.events.clients - Reconnecting websocket connection.
2025-03-21 20:54:17 18:54:17.635 | DEBUG   | prefect.events.clients - Opening websocket connection.
2025-03-21 20:54:17 18:54:17.653 | DEBUG   | prefect.events.clients - Pinging to ensure websocket connected.
2025-03-21 20:54:17 18:54:17.660 | DEBUG   | prefect.events.clients - Pong received. Websocket connected.
2025-03-21 20:54:17 18:54:17.661 | DEBUG   | prefect.events.clients - Resending 0 unconfirmed events.
2025-03-21 20:54:17 18:54:17.663 | DEBUG   | prefect.events.clients - Finished resending unconfirmed events.
2025-03-21 20:54:17 18:54:17.665 | DEBUG   | prefect.client - Connecting to API at http://host.docker.internal:4210/api/
2025-03-21 20:54:17 18:54:17.671 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Emitting event id=9244e3e6-9805-4ca1-8416-8511034d7b14.
2025-03-21 20:54:17 18:54:17.672 | DEBUG   | prefect.events.clients - Added event id=9244e3e6-9805-4ca1-8416-8511034d7b14 to unconfirmed events list. There are now 1 unconfirmed events.
2025-03-21 20:54:17 18:54:17.673 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Emit reconnection attempt 0.
2025-03-21 20:54:17 18:54:17.673 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Sending event id=9244e3e6-9805-4ca1-8416-8511034d7b14.
2025-03-21 20:54:17 18:54:17.674 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Checkpointing event id=9244e3e6-9805-4ca1-8416-8511034d7b14.
2025-03-21 20:54:17 18:54:17.995 | INFO    | prefect.deployment - Pulled code using block 'remote-file-system/minio' into 'remote-file-system-minio'
2025-03-21 20:54:17 18:54:17.996 | DEBUG   | Flow run 'truthful-smilodon' - Changing working directory to 'remote-file-system-minio'
2025-03-21 20:54:18 18:54:17.999 | DEBUG   | Flow run 'truthful-smilodon' - Importing flow code from 'test_flow/test.py:test_flow'
2025-03-21 20:54:18 18:54:18.074 | DEBUG   | prefect.task_runner.dask - Starting task runner
2025-03-21 20:54:18 18:54:18.075 | INFO    | prefect.task_runner.dask - Creating a new Dask cluster with `distributed.deploy.local.LocalCluster`
2025-03-21 20:54:18 18:54:18.082 | INFO    | distributed.http.proxy - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2025-03-21 20:54:18 18:54:18.084 | INFO    | distributed.scheduler - State start
2025-03-21 20:54:18 18:54:18.095 | INFO    | distributed.scheduler -   Scheduler at:     tcp://127.0.0.1:41861
2025-03-21 20:54:18 18:54:18.095 | INFO    | distributed.scheduler -   dashboard at:  http://127.0.0.1:8787/status
2025-03-21 20:54:18 18:54:18.096 | INFO    | distributed.scheduler - Registering Worker plugin shuffle
2025-03-21 20:54:18 18:54:18.136 | INFO    | distributed.nanny -         Start Nanny at: 'tcp://127.0.0.1:41367'
2025-03-21 20:54:18 18:54:18.145 | INFO    | distributed.nanny -         Start Nanny at: 'tcp://127.0.0.1:42513'
2025-03-21 20:54:18 18:54:18.157 | INFO    | distributed.nanny -         Start Nanny at: 'tcp://127.0.0.1:38411'
2025-03-21 20:54:18 18:54:18.162 | INFO    | distributed.nanny -         Start Nanny at: 'tcp://127.0.0.1:32913'
2025-03-21 20:54:18 18:54:18.855 | INFO    | distributed.scheduler - Register worker addr: tcp://127.0.0.1:38101 name: 1
2025-03-21 20:54:18 18:54:18.858 | INFO    | distributed.scheduler - Starting worker compute stream, tcp://127.0.0.1:38101
2025-03-21 20:54:18 18:54:18.860 | INFO    | distributed.core - Starting established connection to tcp://127.0.0.1:53838
2025-03-21 20:54:18 18:54:18.861 | INFO    | distributed.scheduler - Register worker addr: tcp://127.0.0.1:37439 name: 0
2025-03-21 20:54:18 18:54:18.862 | INFO    | distributed.scheduler - Starting worker compute stream, tcp://127.0.0.1:37439
2025-03-21 20:54:18 18:54:18.862 | INFO    | distributed.core - Starting established connection to tcp://127.0.0.1:53848
2025-03-21 20:54:18 18:54:18.872 | INFO    | distributed.scheduler - Register worker addr: tcp://127.0.0.1:33929 name: 2
2025-03-21 20:54:18 18:54:18.873 | INFO    | distributed.scheduler - Starting worker compute stream, tcp://127.0.0.1:33929
2025-03-21 20:54:18 18:54:18.873 | INFO    | distributed.core - Starting established connection to tcp://127.0.0.1:53860
2025-03-21 20:54:18 18:54:18.885 | INFO    | distributed.scheduler - Register worker addr: tcp://127.0.0.1:41389 name: 3
2025-03-21 20:54:18 18:54:18.885 | INFO    | distributed.scheduler - Starting worker compute stream, tcp://127.0.0.1:41389
2025-03-21 20:54:18 18:54:18.886 | INFO    | distributed.core - Starting established connection to tcp://127.0.0.1:53866
2025-03-21 20:54:18 18:54:18.935 | INFO    | distributed.scheduler - Receive client connection: PrefectDaskClient-e4e82e8e-0685-11f0-800b-5a84e7a1df17
2025-03-21 20:54:18 18:54:18.935 | INFO    | distributed.core - Starting established connection to tcp://127.0.0.1:53872
2025-03-21 20:54:18 18:54:18.938 | INFO    | prefect.task_runner.dask - The Dask dashboard is available at http://127.0.0.1:8787/status
2025-03-21 20:54:18 18:54:18.974 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Emitting event id=65d9fda6-2aae-4ca0-89af-548d7f04587b.
2025-03-21 20:54:18 18:54:18.975 | DEBUG   | prefect.events.clients - Added event id=65d9fda6-2aae-4ca0-89af-548d7f04587b to unconfirmed events list. There are now 2 unconfirmed events.
2025-03-21 20:54:18 18:54:18.975 | INFO    | Flow run 'truthful-smilodon' - Beginning flow run 'truthful-smilodon' for flow 'test-flow'
2025-03-21 20:54:18 18:54:18.977 | INFO    | Flow run 'truthful-smilodon' - View at http://host.docker.internal:4210/runs/flow-run/f5d7f4c3-e553-4970-9702-e0d386196c35
2025-03-21 20:54:18 18:54:18.977 | DEBUG   | Flow run 'truthful-smilodon' - Executing flow 'test-flow' for flow run 'truthful-smilodon'...
2025-03-21 20:54:18 18:54:18.975 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Emit reconnection attempt 0.
2025-03-21 20:54:18 18:54:18.981 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Sending event id=65d9fda6-2aae-4ca0-89af-548d7f04587b.
2025-03-21 20:54:18 18:54:18.981 | INFO    | Flow run 'truthful-smilodon' - Running test_flow
2025-03-21 20:54:18 18:54:18.982 | DEBUG   | prefect.events.clients - EventsClient(id=281473667460560): Checkpointing event id=65d9fda6-2aae-4ca0-89af-548d7f04587b.
2025-03-21 20:54:19 2025-03-21 18:54:19,815 - distributed.protocol.core - CRITICAL - Failed to deserialize
2025-03-21 20:54:19 Traceback (most recent call last):
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 175, in loads
2025-03-21 20:54:19     return msgpack.loads(
2025-03-21 20:54:19   File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 172, in _decode_default
2025-03-21 20:54:19     return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 92, in loads
2025-03-21 20:54:19     return pickle.loads(x)
2025-03-21 20:54:19 ModuleNotFoundError: No module named 'common'
2025-03-21 20:54:19 18:54:19.817 | INFO    | distributed.core - Connection to tcp://127.0.0.1:53838 has been closed.
2025-03-21 20:54:19 18:54:19.819 | INFO    | distributed.scheduler - Remove worker addr: tcp://127.0.0.1:38101 name: 1 (stimulus_id='handle-worker-cleanup-1742583259.819111')
2025-03-21 20:54:19 2025-03-21 18:54:19,819 - distributed.protocol.core - CRITICAL - Failed to deserialize
2025-03-21 20:54:19 Traceback (most recent call last):
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 175, in loads
2025-03-21 20:54:19     return msgpack.loads(
2025-03-21 20:54:19   File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 172, in _decode_default
2025-03-21 20:54:19     return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 92, in loads
2025-03-21 20:54:19     return pickle.loads(x)
2025-03-21 20:54:19 ModuleNotFoundError: No module named 'common'
2025-03-21 20:54:19 18:54:19.823 | INFO    | distributed.nanny - Closing Nanny gracefully at 'tcp://127.0.0.1:42513'. Reason: worker-handle-scheduler-connection-broken
2025-03-21 20:54:19 18:54:19.824 | INFO    | distributed.core - Connection to tcp://127.0.0.1:53848 has been closed.
2025-03-21 20:54:19 18:54:19.824 | INFO    | distributed.scheduler - Remove worker addr: tcp://127.0.0.1:37439 name: 0 (stimulus_id='handle-worker-cleanup-1742583259.824397')
2025-03-21 20:54:19 18:54:19.825 | INFO    | distributed.batched - Batched Comm Closed <TCP (closed) Scheduler connection to worker local=tcp://127.0.0.1:41861 remote=tcp://127.0.0.1:53848>
2025-03-21 20:54:19 Traceback (most recent call last):
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/comm/tcp.py", line 298, in write
2025-03-21 20:54:19     raise StreamClosedError()
2025-03-21 20:54:19 tornado.iostream.StreamClosedError: Stream is closed
2025-03-21 20:54:19 
2025-03-21 20:54:19 The above exception was the direct cause of the following exception:
2025-03-21 20:54:19 
2025-03-21 20:54:19 Traceback (most recent call last):
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/batched.py", line 115, in _background_send
2025-03-21 20:54:19     nbytes = yield coro
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/tornado/gen.py", line 766, in run
2025-03-21 20:54:19     value = future.result()
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/comm/tcp.py", line 308, in write
2025-03-21 20:54:19     convert_stream_closed_error(self, e)
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/comm/tcp.py", line 137, in convert_stream_closed_error
2025-03-21 20:54:19     raise CommClosedError(f"in {obj}: {exc}") from exc
2025-03-21 20:54:19 distributed.comm.core.CommClosedError: in <TCP (closed) Scheduler connection to worker local=tcp://127.0.0.1:41861 remote=tcp://127.0.0.1:53848>: Stream is closed
2025-03-21 20:54:19 18:54:19.829 | INFO    | distributed.nanny - Closing Nanny gracefully at 'tcp://127.0.0.1:41367'. Reason: worker-handle-scheduler-connection-broken
2025-03-21 20:54:19 2025-03-21 18:54:19,832 - distributed.protocol.core - CRITICAL - Failed to deserialize
2025-03-21 20:54:19 Traceback (most recent call last):
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 175, in loads
2025-03-21 20:54:19     return msgpack.loads(
2025-03-21 20:54:19   File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/core.py", line 172, in _decode_default
2025-03-21 20:54:19     return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
2025-03-21 20:54:19   File "/usr/local/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 92, in loads
2025-03-21 20:54:19     return pickle.loads(x)
2025-03-21 20:54:19 ModuleNotFoundError: No module named 'common'

Do you need some help running the sample project?

cvetelinandreevdreamix avatar Mar 21 '25 18:03 cvetelinandreevdreamix

Any clue?

cvetelinandreevdreamix avatar Apr 16 '25 11:04 cvetelinandreevdreamix

It seems that I'm the only one experience this issue. Maybe it is my dev setup? I'm wondering if somebody else managed to run the sample project, deploy and run a flow successfully.

How should I debug this?

cvetelinandreevdreamix avatar May 13 '25 17:05 cvetelinandreevdreamix

The issue is that the function in the common module is not marked with @task annotation. After adding the annotation the error Module not found 'common' does not appear and works fine. Thanks for all of you who had put their time into this.

cvetelinandreevdreamix avatar Sep 22 '25 02:09 cvetelinandreevdreamix

My case was calling one of the common functions too often, so creating a task for every call was an overhead. You can skip this by calling common_function_annotated_with_task.fn(...)

cvetelinandreevdreamix avatar Sep 29 '25 12:09 cvetelinandreevdreamix