python-novice-inflammation icon indicating copy to clipboard operation
python-novice-inflammation copied to clipboard

How to setup a virtual environment with Anaconda (and/or venv)

Open jiqicn opened this issue 2 years ago • 3 comments

I found that it could be nice to add the content of creating a virtual environment to the "Summary and Setup" session. There are several reasons for doing that:

  • It's a very useful concept.
  • I guess we don't want to ruin the system-level Python installation or the base env of Anaconda on students' computers.
  • Anaconda is introduced along the course and it can be easily introduced together with the Anaconda setup.

jiqicn avatar Jul 19 '23 09:07 jiqicn

Hi @jiqicn ,

I feel the same! Python virtual environment is one of the most important things for using Python.

But I would prefer making a new episode between episode 1 and 2, so the first line of code in EP#2, which is import numpy, would make sense.

I think with the recent changes around Anaconda, if we are going to make such episode, we probably need to introduce or compare conda, mamba, pip, venv, and uv, as well as some common best practices for these choices. For example, running pip inside a conda env for the best encapsulation.

Jupyter is also important, I would also include some information about the connection between a conda env and a jupyter kernel. A jupyter set up section might also be useful.

We had this workshop this January at my university, I was a helper. We were using our HPC cluster so lots of stuff was already setup for the learners, but we still need to add env usage info as a part of the workshop setup.

Regards, Nil

NilaBlueshirt avatar Apr 11 '25 03:04 NilaBlueshirt

It might be useful to check out the recent posts made by @tobyhodges about updating the overall fundamental instructions for all our lessons: https://carpentries.org/blog/2025/03/lesson-setup-instructions-task-force-recommendations/

There's been a lot of discussion, consultation and feedback about what might be suitable in a post-Anaconda world, so I would very much recommend joining in!

froggleston avatar Apr 11 '25 10:04 froggleston

Thanks for the info! Do we have an active discussion thread somewhere about this topic?

I think switch to miniforge and conda-forge is a very stable choice, and using mamba instead of conda is even better.

More info: https://conda-forge.org/docs/user/transitioning_from_defaults/

NilaBlueshirt avatar Apr 16 '25 21:04 NilaBlueshirt

This is a modified version of an email I sent to the Curriculum Team in response to @tobyhodges's blog post on this subject - it was suggested I open an issue, but I'd rather not duplicate this one.

I taught the Python Inflammation lesson at a SWC workshop last week where users had a variety of issues with the Python setup using Miniforge. That the lesson material is currently out of step with the setup instructions on the workshop webpage added to the confusion. Many of the issues related to the creation and activation of the virtual environment.

I think that moving the creation of the virtual environment from the pre-workshop setup to part of the lesson would make things less intimidating and stressful for the learners and the reduction in time spent sorting out setup issues would help mitigate the effect of the extra content in the lesson. Moreover, many modern Python installations strongly discourage the global installation of packages, so virtual environments are increasingly likely to be encountered early on in one's use of Python.

So I think it is essential that learners are taught about virtual environments in our Python lessons.

However, I would suggest that this should be held back until the users have been introduced to some Python fundamentals and, in particular, have used some Standard Library functions (even if they are only print() and type()). This would mean that this first part of the lesson could not be taught using a Jupyter Notebook - the REPL would probably make most sense. Waiting until the point at which we want to start using packages that are not part of the Standard Library should help learners understand why we need virtual environments. The installation and introduction of Jupyter could then help to demonstrate the power of the packages available. The display of https://xkcd.com/353/ is optional.

I typically encounter manuals and READMEs which refer to pip and venv much more than conda, and I suspect that learners will, as well. Therefore, I suggest using pip and venv to setup the virtual environment, while mentioning that conda and other package managers also exist.

This episode of the lesson should cover

  • A description of what virtual environments are and why they are useful
  • Creating a virtual environment
  • Activating and deactivating a virtual environment
  • Installing packages in a virtual environment (with a sidebar on uninstalling and the existence of other pip commands)
  • Installing packages from a requirements.txt file (with a sidebar on creating one)

pgmccann avatar Jun 26 '25 08:06 pgmccann

I can see pip is easier to set up for beginners, but I don't think pip can handle complicated dependency trees like conda, and pip has different installation commands for some packages. That means we probably need to do lots of testing and some adjustments if we are going to switch to pip. And I personally don't think we should go back to pip, which is like going backward.

NilaBlueshirt avatar Jun 26 '25 15:06 NilaBlueshirt

There are a range of reasons why we have recommended (via the Lesson Setup Task Force discussions) the use of conda, which include:

  • We want to provide an environment that is as universal as possible across all our core curriculum lessons. One of the critical bits of feedback we receive is that lesson setup instructions are complicated and often lesson-specific. We want to minimise this up front before any workshop starts. A "coverall" setup that gives learners the easiest route to the core tools they need is paramount.
  • as @NilaBlueshirt says, conda has extra benefits in that it contains a lot of software that isn't packaged with pip, including many bioinformatics and data science tools. This is a bonus to the point above where users don't have to use a whole new virtual environment system to get a specific tool for a specific part of a lesson.
  • As we move away from Anaconda, we want to keep as much of the lessons unchanged to ease the burden on our maintainers. Using miniconda in place of anaconda should be the path of least resistance.

In terms of virtual environment teaching, I completely agree that they are essential parts of a modern data science setup, so we should definitely include the "most useful first" concepts without overloading learners who may have never seen the shell before let alone fired up a virtual python environment. I'd really like to work together on this kind of material!

In any case, we strongly advise that workshop hosts and instructors run a pre-workshop "drop in" or "helpdesk" session where learners can get help setting up the tools required. Doing so at the start of workshops can really eat into teaching time!

froggleston avatar Jun 26 '25 16:06 froggleston

Thanks all for contributing to this discussion so far. I want to add one more point that has not been (directly) covered so far, then prompt a discussion of next steps.

On pip+venv vs conda, which was the topic of much debate in the task force meetings last year, I would like to keep focus on what I see as the most important point: our goal should be to give learners a minimal, working mental model of environments. They need enough info to be able to follow the workshop and keep working with Python afterwards. I see one aspect of that as making learners aware that various different solutions for env management exist for Python: pip+venv, conda, ux, pixi, etc (a different xkcd is relevant here). If we get it right, learners should leave with enough understanding of the principles of environment management that they can understand what is happening with pip/venv in those manuals and READMEs mentioned by @pgmccann.

However, there is really not space in the lessons for a substantial comparison of these different solutions. I am thinking more along the lines of a callout mentioning that other tools for environment management exist, briefly describing the similarities and differences between them, and linking to resources where learners can find out more about each.

I am very grateful for the offers from @pgmccann and @froggleston to get involved with the development of new content on this topic. I would like to convene a group of community members who can begin working on the changes needed to this lesson, possibly leading to similar changes being made to the other Python lessons in SWC, LC, and DC. If you are both happy to join me in that, I would be delighted. I will put a call out on relevant community channels as well.

tobyhodges avatar Jun 30 '25 12:06 tobyhodges

Here is a concept map representing the mental model I would like learners to leave the lesson/workshop with:

flowchart
  env[environment]
  pydeps[Python dependencies]
  otherdeps[other tools and dependencies]
  proj[project]
  conda
  repro[reproducibility]
  errors[incompatibility errors]
  envtools[tools to manage Python environments]
  act[<code>conda activate</code>]
  create[<code>conda create</code>]
  deact[<code>conda deactivate</code>]
  install[<code>conda install</code>]
  env --contains--> pydeps
  env --may contain--> otherdeps
  env--specific to --> proj
  conda --configures and manages--> env
  env --enhances--> repro
  env --helps to avoid--> errors
  conda --"one of many"--> envtools
  act--subcommand of-->conda
  deact--subcommand of-->conda
  create--subcommand of-->conda
  install--subcommand of-->conda
  create--makes a new-->env
  install--adds a new-->pydeps
  install--adds a new-->otherdeps
  act--start working in-->env
  deact--stop working in-->env

I tried to keep the above to an absolute minimum and perhaps you will agree that it is already a lot of new knowledge to teach to novice learners. We will need to be very careful to only include lesson content that is directly relevant.

Here are some proposed learning objectives for the episode:

By the end of the episode, learners should be able to...

  • Identify some advantages of creating an environment for their scripts to run in.
  • Create a new environment
  • Install Python libraries into an environment
  • Activate and deactivate an environment

If we can design an exercise for each of the objectives above, I think we will be in a good position.

Looking forward to your feedback.

tobyhodges avatar Jun 30 '25 13:06 tobyhodges

@tobyhodges This looks terrific! Are you trying to add this episode to every Python-based workshop?

For the "Tools to manage Python environments" part, are you going to mention other tools like pip? I ask this because I met lots of learners in my institute who found it very confusing to understand what's going on with pip, i.e. why use conda when we have pip, why use pip when we have conda, what if some packages need pip but some need conda, etc.

NilaBlueshirt avatar Jul 07 '25 01:07 NilaBlueshirt

Yes, we will need to add something to both SWC novice Python lessons, and to the novice lessons elsewhere too. It shouldn't be needed in the lessons where foundational knowledge of Python is a prerequisite e.g. Data Carpentry Image Processing.

For the "Tools to manage Python environments" part, are you going to mention other tools like pip? I ask this because I met lots of learners in my institute who found it very confusing to understand what's going on with pip, i.e. why use conda when we have pip, why use pip when we have conda, what if some packages need pip but some need conda, etc.

I tried to get at this with a previous comment:

However, there is really not space in the lessons for a substantial comparison of these different solutions. I am thinking more along the lines of a callout mentioning that other tools for environment management exist, briefly describing the similarities and differences between them, and linking to resources where learners can find out more about each.

The key word there being "briefly" because I think we will need to keep things short and sweet. As I mentioned above, the challenge will be optimising this content so that it delivers what is needed for learners to make progress with nothing extraneous.

tobyhodges avatar Jul 08 '25 14:07 tobyhodges