rules_conda
rules_conda copied to clipboard
Recreating environments from scratch
Hey, any update on that?
Not really, but I guess disabling conda clean
(with clean = False
in conda_create
) is a nice workaround. That way:
- conda caches all packages (the downloaded tarballs)
- if you change something in your
environment.yml
then Bazel will delete the environment and recreate it from scratch, but conda will first check if packages are in cache and will download only what's changed
When all packages all already downloaded then simply creating the environment once again is really quick. I think that alone solves the issue in practice.
Otherwise, we would have to convert conda packages to Bazel packages so Bazel can check what's changed and this seems like too big of a deal for that little effect (nevertheless, that's the way pip_import
from rules_python
works). But that also has a beneficial side-effect of the possibility of having sub-environments with packages only for the current target. So if anyone has a good idea how to do something like that for rules_conda
without breaking the simplicity of its use, then PRs are welcome.
Hi! I'm not an expert on Bazel and probably won't actually be able to use it, or else this'd be a PR, but I do know a thing or two about Conda. Are you aware of the conda env update --prune
command? It'll update an existing environment to match an environment specification file exactly, adding or removing packages as necessary. So its for updating an environment spec, not for updating the package versions in an environment. In principle, I think you should be able to create an empty environment and then just update them when the environment specification changes.
While I'm here, I'll also mention that you can probably substantially improve your performance by using mamba rather than conda, though this will probably mean having to install it in your base environment via conda. It doesn't need to be in the dev environments though. Mamba is a version of conda with the same CLI API but a bunch of optimisations that make it much faster - solving environments in parallel and that sort of thing.
Hope that helps! And I hope I can use Bazel and your work some day!
Hi, thanks for the feedback. I'm afraid the --prune
option is of no use for us. Bazel is very strict on correctness and reproducibility. Rule outputs can only depend on its clearly defined inputs. In particular, the output of some rule can't be an input to the same rule. For example, the output of rules_conda
is a conda environment and rules_conda
can't depend on previously created environments. Bazel enforces that by simply deleting the environment and calling the rule again (every time any input changes). Putting it another way: the created environment can't and shouldn't be updated after its creation. Making sure the environment is up to date is Bazel's responsibility (and it fulfills that by just recreating it from scratch). But as I said, with conda cache it's not really an issue in practice as long as people are not short on storage space.
As for mamba
, I think it may be valuable, I will create another issue to research that topic.