windmill icon indicating copy to clipboard operation
windmill copied to clipboard

bug: Scheduled script with relative import fails when import added to upstream

Open andnofence opened this issue 1 year ago • 15 comments

Describe the bug

When using shared code from a common module scripts that run on a schedule and depend on the common module fail if new imports are added to the common module.

To reproduce

  1. Create a simple piece of shared code, just call it shared_code and add the following:
def shared_fn():
    return "Shared"
  1. Create a script that imports the shared code and calls it, call it uses_shared for example:
from f.shared.shared_code import shared_fn

def main():
    print(shared_fn())
  1. Create a schedule that runs uses_shared
  2. Edit shared_code and add an import that requires an external dependency, e.g. clickhouse_connect
  3. The scheduled invocation of uses_shared fails with a ModuleNotFoundError: No module named 'clickhouse_connect'

Expected behavior

The scheduled function should continue to run, the necessary requirements should be installed on the first run after the updated shared code is deployed.

Screenshots

No response

Browser information

No response

Application version

CE v1.227.0-2-g032a8e3dd

Additional Context

No response

andnofence avatar Dec 18 '23 20:12 andnofence

image

Screenshot of the error, I named my script using_shared but otherwise it's identical to what you should get from the description above.

andnofence avatar Dec 18 '23 20:12 andnofence

Hi @andnofence , thanks for the report.

The issue is not specific to schedules. When you deploy a script, we compute the required dependencies in a lockfile which guarantee consistent execution of that script. When that script depends on a common module, we include its imports to compute the lockfile.

The issue here is that we do not recompute the lockfile of importers when you re-deploy the common module. To re-compute the lockfile of those, one has to re-deploy. What needs to be improved here is that when deploying a common module, we need to alert or show a modal to quickly re-deploy any downstream consumers. For this to work, we will need to expose statically that a script deoends on a relative import to be able to efficiently do that search.

rubenfiszel avatar Dec 18 '23 22:12 rubenfiszel

Thanks for the explanation @rubenfiszel. That seems like a pretty big gotcha, if I have several scripts that depend on a shared module (as one would I guess, why else use a shared module...) then adding an import to that shared module would break all the downstreams until they are redeployed.

I really like everything I've seen of Windmill so far, so although this was a bit of setback I'm not ready to give up yet!

But I must admit that my knee-jerk reaction to the automatic imports feature was that it seems a bit too magic, and I personally think I would be happier just maintaining a requirements.txt like I'm used to.

So one idea I had was if it would be possible to move the dependency handling to a build step where the Docker image for the workers includes a pip install -r requirements.txt step. If I'm going to deploy this in production I will do so using a git workflow, so part of the continuous deployment process could be building and deploying this new Docker image if I detect a change to requirements.txt. The effect would be that each worker image would come with all the necessary dependencies pre-installed.

But I'm not sure this would work, it seems Windmill doesn't actually check what's installed but instead installs packages through this magic dependency resolution and maintains caches of these somewhere? So would this run the risk of conflicts between the Windmill dependency resolution and what I have installed in the Docker image?

andnofence avatar Dec 19 '23 00:12 andnofence

I agree that it is a big pain and probably one of the biggest blind spot of the current system for python.

@andnofence you can pre-bundle all your dependencies using: PIP_LOCAL_DEPENDENCIES and ADDITIONAL_PYTHON_PATHS. Simply add your pip folder to ADDITIONAL_PYTHON_PATHS and add the names of the imports to ignore to PIP_LOCAL_DEPENDENCIES. I will add a mode that integrate better with a local pip folder to avoid having to define all of this. Those env variables are described in the README

However, I would strongly recommend instead to use this instead: https://www.windmill.dev/docs/advanced/imports#pinning-dependencies where you would define your requirements at the script level. I understand it's a bit more pain at first but one really strong benefit of the windmill dependency system is that all scripts are executed in their own virtualenv, with exact versions for each script as defined by its lockfile and hence introducing a new dependency for one script doesn't lower the version constraints for all other scripts.

rubenfiszel avatar Dec 19 '23 01:12 rubenfiszel

Adding the requirements at the script level would be acceptable, maybe even quite nice as it keeps the dependencies close to the code that uses. However, it doesn't appear to solve the issue raised here? I modified my shared_code like this:

#requirements:
#psutil==5.9.4
import psutil

def shared_fn():
    return "Shared"

And when I tested it the psutil package was pip installed as expected. But after deploying it my downstream using_shared code is failing again with the error ModuleNotFoundError: No module named 'psutil'

andnofence avatar Dec 19 '23 01:12 andnofence

Yes, when using #requirements, you are overriding the lockfile generation so you need to explicitly list all of your requirements in the importer, including the one of the common module. But at least there is no magic.

rubenfiszel avatar Dec 19 '23 01:12 rubenfiszel

Hm. I gave that a shot by adding the requirements to the using_shared, but unfortunately this triggered a new error that I don't understand, it seems like something internal to Windmill.

New code:

#requirements:
#psutil==5.9.4
from f.shared.shared_code import shared_fn

def main():
    print(shared_fn())

New error:

ExecutionErr: ExitCode: 1, last log lines:
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/tmp/windmill/wk-default-1dc5dc91dd43-sqqup/018c7fb2-1743-89f8-844e-72a0aa3718e3/wrapper.py", line 8, in <module>
    from u.andre import using_shared as inner_script
  File "/tmp/windmill/wk-default-1dc5dc91dd43-sqqup/018c7fb2-1743-89f8-844e-72a0aa3718e3/u/andre/using_shared.py", line 3, in <module>
    from f.shared.shared_code import shared_fn
  File "/tmp/windmill/wk-default-1dc5dc91dd43-sqqup/018c7fb2-1743-89f8-844e-72a0aa3718e3/loader.py", line 31, in find_spec
    import requests
ModuleNotFoundError: No module named 'requests'

andnofence avatar Dec 19 '23 01:12 andnofence

You need to list requests as well (which is what you would do if you were to use a common requirements.txt between the shared module and this script)

So:

#requirements:
#psutil==5.9.4
#requests

from f.shared.shared_code import shared_fn

def main():
    print(shared_fn())

rubenfiszel avatar Dec 19 '23 01:12 rubenfiszel

But just to be clear on this, we need to improve the DX of this problem. It only happens if you use a shared module in python and use a new dependency, but when you do, having to re-deploy every consumer is very cumbersome and we will remediate to it.

rubenfiszel avatar Dec 19 '23 01:12 rubenfiszel

But my two scripts are not using requests? So I guess it's something that Windmill requires and when using inline requirements you always have to list requests then?

But until this is improved I think I'll explore the PIP_LOCAL_DEPENDENCIES approach you mentioned.

I understand that isolating the script is a benefit, but my (very inexperienced) mental model of how I would organize code in Windmill is that I'd want to use one set of dependencies per workspace, and thus one requirements.txt and virtualenv per workspace would be fine.

andnofence avatar Dec 19 '23 01:12 andnofence

Apologies, I didn't see requests wasn't used in your logic. We indeed use requests to resolve the local imports and so would need it listed there but we will fix this by making the http request use urllib directly (which is part of the stdlib)

rubenfiszel avatar Dec 19 '23 02:12 rubenfiszel

After having gave it more thought, I'm now convinced the best solution would be that upon deployment of the common module, all the importers of that common module would get re-deployed automatically. It require a bit of machinery but it can be done. See #2883 for tracking

rubenfiszel avatar Dec 19 '23 03:12 rubenfiszel

Great, I've followed that that issue and look forward to testing the fix!

andnofence avatar Dec 19 '23 06:12 andnofence

This is fixed for python scripts being imported by python scripts by maintaining a dependency map. Updating the common dependency will trigger a recomputation of the dependencies of all the dependent scripts.

What is left:

  • do the same for bun
  • also recompute the lockfile of flows and apps which could be affected by this in their inline script

rubenfiszel avatar Jan 08 '24 10:01 rubenfiszel

Hi, apologies for hijacking this. My flows runs fine from Windmill web UI, but if I call from API,

{{windmill_host}}/api/w/{{workspace_name}}/jobs/run_wait_result/f/path-to-flows-name

it gives this error instead: "ExitCode: 1, last log lines:\n1 | Not found: Script not found at name f/Shared/scripts/default\n ^\nerror: Expected \";\" but found \"found\"\n at /tmp/windmill/wk-default-a7b12051c9ff-mNvkh/019123a1-afa9-903d-0c84-5e6c1141a2bd/f/Shared/scripts/default.ts.url:1:5\n1 | Not found: Script not found at name f/Shared/scripts/default\n ^\nerror: Unexpected :\n at /tmp/windmill/wk-default-a7b12051c9ff-mNvkh/019123a1-afa9-903d-0c84-5e6c1141a2bd/f/Shared/scripts/default.ts.url:1:10\nBun v1.1.18 (Linux x64 baseline)"

is this a related error to this issue or is it belongs to https://github.com/windmill-labs/windmill/issues/3721 ?

razuro avatar Aug 05 '24 17:08 razuro