rules_conda icon indicating copy to clipboard operation
rules_conda copied to clipboard

Recreating environments from scratch

Open kpodsiad opened this issue 3 years ago • 3 comments

Hey, any update on that?

kpodsiad avatar Mar 26 '21 23:03 kpodsiad

Not really, but I guess disabling conda clean (with clean = False in conda_create) is a nice workaround. That way:

  • conda caches all packages (the downloaded tarballs)
  • if you change something in your environment.yml then Bazel will delete the environment and recreate it from scratch, but conda will first check if packages are in cache and will download only what's changed

When all packages all already downloaded then simply creating the environment once again is really quick. I think that alone solves the issue in practice.

Otherwise, we would have to convert conda packages to Bazel packages so Bazel can check what's changed and this seems like too big of a deal for that little effect (nevertheless, that's the way pip_import from rules_python works). But that also has a beneficial side-effect of the possibility of having sub-environments with packages only for the current target. So if anyone has a good idea how to do something like that for rules_conda without breaking the simplicity of its use, then PRs are welcome.

spietras avatar Mar 27 '21 16:03 spietras

Hi! I'm not an expert on Bazel and probably won't actually be able to use it, or else this'd be a PR, but I do know a thing or two about Conda. Are you aware of the conda env update --prune command? It'll update an existing environment to match an environment specification file exactly, adding or removing packages as necessary. So its for updating an environment spec, not for updating the package versions in an environment. In principle, I think you should be able to create an empty environment and then just update them when the environment specification changes.

While I'm here, I'll also mention that you can probably substantially improve your performance by using mamba rather than conda, though this will probably mean having to install it in your base environment via conda. It doesn't need to be in the dev environments though. Mamba is a version of conda with the same CLI API but a bunch of optimisations that make it much faster - solving environments in parallel and that sort of thing.

Hope that helps! And I hope I can use Bazel and your work some day!

Yoshanuikabundi avatar Jun 29 '21 10:06 Yoshanuikabundi

Hi, thanks for the feedback. I'm afraid the --prune option is of no use for us. Bazel is very strict on correctness and reproducibility. Rule outputs can only depend on its clearly defined inputs. In particular, the output of some rule can't be an input to the same rule. For example, the output of rules_conda is a conda environment and rules_conda can't depend on previously created environments. Bazel enforces that by simply deleting the environment and calling the rule again (every time any input changes). Putting it another way: the created environment can't and shouldn't be updated after its creation. Making sure the environment is up to date is Bazel's responsibility (and it fulfills that by just recreating it from scratch). But as I said, with conda cache it's not really an issue in practice as long as people are not short on storage space.

As for mamba, I think it may be valuable, I will create another issue to research that topic.

spietras avatar Jul 10 '21 22:07 spietras