Automatic Julia upgrade may be surprising
It's surprising to me that juliapkg will automatically pick the latest compatible Julia version from juliaup's list, even if it isn't installed, while a previous compatible version is available.
I'd prefer to default to using an installed version if it meets the Julia compat constraints. Partly this is because I want to have a way to keep a "recommended" version of Julia even if a newer version is available (see also #29).
Also, Julia Pkg now has the PRESERVE_TIERED_INSTALLED resolver mode that prefers to use compatible, already-installed versions of packages even if the registry has newer releases. It would be nice if juliapkg had a corresponding mode for Julia installation.
Could maybe make this behavior configurable?
+1
Yeah I think https://github.com/JuliaPy/pyjuliapkg/issues/29 would also help because we could fix a single recommended version. That way a new Julia release wouldn't instantly brick packages upon simply importing a Python package.
you don't wanna know how much money this cost us. A new julia release was downloaded automagically on all nodes individually in a distributed cluster because a new julia was released.
Agreed that a setting to prefer existing installations would be good. In the meantime you can set the environment variable PYTHON_JULIAPKG_EXE to a specific Julia if you want to avoid ever installing it.
well, we had this in place:
from juliacall import Pkg as jlPkg
from juliacall import Main as jl
...
jPkg.offline(True)
jlPkg.activate(project)
so we thought we were safe ...
However, the jPkg.offline(True) comes too late, as the damage is done as a side effect of the import of juliacall (see juliacall/init.py which does init() )
So the only thing that works is going via the ENV vars.
You can also import juliapkg before juliacall in order to change to offline status programmatically.
Agreed that a setting to prefer existing installations would be good.
Could we maybe make the default behavior prefer existing installations?
Import side effects are really not ok. Just don't
The entire julia and julia-dependency installation in juliapkg is an import side effect... Is this what you mean?
At some point I feel like we should try to switch to doing the install at pip install time: https://github.com/JuliaPy/pyjuliapkg/issues/35 #16
Though it is indeed quite tricky due to the dynamic nature of Julia environments
Part of the issue is that for most Python backends, the dependencies can be entirely independent. JAX has its own compiled backend. PyTorch has its own compiled backend. While they aren’t compatible with eachother (can’t do jax.numpy.sin on a torch tensor), they can be installed without knowing about each other’s existence.
Whereas for Julia backends of Python libraries, all the backend libraries can actually talk to eachother and pass objects back and forth. Different backends all get to sit in the same combined environment and compile methods from one on objects of the other - which is great for compatibility across tools. But I think this is why it’s also way easier to do this environment config at import time. And why it is unusual compared to traditional Python backends.
I mean ideally I think it would be nice to set this up automatically. I think it’s just not as straightforward as one might expect though, because of this dynamicism and cross-compatibility of Julia backends for Python.
Yeah... juliapkg is trying to be the interface to Julia's package manager for all Julia packages required by python packages in the current virtual env. An alternative is to make the resolution process manually triggered - you run a command like python -m juliapkg resolve after your pip install or poetry install. This would have to harvest julia deps from the whole python virtual env somehow... which I guess might be easier if they were in pyproject.toml?
It can get worse:
- I've seen the julia part, not knowing it was running inside a python process starting to download and install a python interpreter to be able to call python code from julia.
- the python process could be forked... (celery workers, gunicorn, ....) then you have a download/install/compile going on per process.
Also, it's manipulating the (project's) Project.toml & Manifest.toml files. It should not do this, it should just listen to them. They (should) have authority.
Also, it's manipulating the Project.toml & Manifest.toml files. It should not do this, it should just listen to them. They (should) have authority
Manipulating the managed Project.toml and Manifest.toml is how juliapkg works. There’s no other way to install stuff in Julia other than manipulating those. But it owns and manages those files so this shouldn’t be unexpected. I suppose what is unexpected was these being changed at runtime, but this is kinda needed at the moment just due to the dynamic and shared nature of Julia environments - see explanation above. But yeah it’d be nice to have this stuff happen at pip install time if it’s even possible.
(If you mean an externally non-managed project, this shouldn’t happen. So please submit a bug report if this does.)
But then again I suppose if the Manifest.toml is already compatible with juliapkg’s requirements, then it shouldn’t be updated. So perhaps that extra logic should be added, to prevent unintended updates.
the python process could be forked... (celery workers, gunicorn, ....) then you have a download/install/compile going on per process.
So if it’s a shared filesystem, then the compile will only happen in one process because Julia will use a file locking mechanism to prevent simultaneous precompilation. But the download of Julia itself I suppose is not locked since juliapkg manages this. So we should probably put in a patch for that.
P.S., as a workaround to prevent these unintended updates, you should be freezing the version with:
import juliapkg
juliapkg.require_julia("=1.11.2")
If you write "1.11.2" by itself, this is actually setting the minimum version, rather than fixing it.
Similar for other dependencies.
You can also import
juliapkgbeforejuliacallin order to change to offline status programmatically.
if import order matters you're doing something wrong. really.
@toolslive See https://github.com/JuliaPy/pyjuliapkg/issues/39#issuecomment-2594217889
I mean this isn't even that uncommon a pattern
import matplotlib
matplotlib.use('Agg') # Change backend
import matplotlib.pyplot as plt
Lots of libraries involve manipulating the backend settings before loading the package. Another example:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import tensorflow as tf
Sigh. Even though the side effect of updating a global dict is probably suspect, It's limited and predictable. Compare this to downloading a kitchen sink and writing all over the file system. Anyway, we contained this behaviour via ENV vars..
( However, some of our celery workers, which are running in docker containers, and calling julia, crash with a SEGV. I have my suspicions, but I still need to get my hands on a core dump before I assign blame. SNAFU )
Ok so these are a few different things:
- Import order - As I explained, import order having some effect on behavior is kind of unavoidable in Python (unless written in pure Python - but basically nothing is). Lots of other libraries have this behavior too. Importing juliapkg before juliacall to configure it is not ideal but there's not really any other way.
- Downloading/updating when you don't need to - This is the core issue in this thread about preferring already-installed versions. I feel like this could be easily fixable(?). But this is also separate from when we do the install.
- Install-time vs import-time - Moving everything to pip install time is a much bigger architectural change (#35/#16), and is much harder in general. This needs to be thought about separately from the other issues. (Any help is always appreciated.)
Regarding "writing all over the file system," it really only writes to:
- The Julia installation directory
- The local virtual env
Which are both required and [I would assume] expected. It's just when it installs that seems unexpected. Right?
@toolslive It sounds like you might just not want what juliapkg provides at all. I sympathize, because I am also working on a python package with Julia dependencies in which we basically just bypass it - as you probably know, you can set certain juliacall environment variables to do this. Actually, we do use some of juliapkg. We use internal APIs to have it locate or install Julia (but not upgrade it if present!) and then switch it to offline mode completely. We tell juliacall the Julia location via the appropriate environment variable.
We manage Julia package dependencies by supplying a fixed Project.toml and Manifest.toml, which we ship with the package (to avoid surprises with upstream dependency changes we haven't tested against - every now and again some change in SciML breaks something, at least temporarily!).
The problem with this approach is that it rules out anybody using a different python package that uses juliacall and juliapkg together with ours. Fortunately, there aren't too many of these yet!
If julia package management were integrated with python package management, we could proably avoid a lot of these shenanigans.