Python Virtual Environments

Introduction:

Python is a popular programming tool for use on many types of systems, including HPC systems. Python has many packages that can be added. So many that it doesn't make sense to try to have all of them installed in a centrally managed Python environment. So how can you install the packages that you need? Virtual Environments. When you create a Virtual Environment (VE), you create an environment that inherits the base, system-wide Python environment but in your own user space. Once someone's VE is created and enabled, that person can install packages into that VE. You can create multiple VEs for different tasks. This makes it easier to manage packages that may have conflicting version requirements.

Creating and using Anaconda3 Virtual Environments

For the most part, we use Anaconda3 as the version of Python for use on the ACG HPC systems. To load Anaconda3 into your shell environment run the following in your a terminal session on Katahdin:

module load anaconda3

By default, this does not initialize Virtual Environments in Anaconda. This is important because the VE system and SLURM do not always work well together. In order to initialize your "base" Virtual Environment, run:

$INIT_CONDA


Next, create a Conda Virtual Environment (you can name it anything; the last parameter):


conda create --name cadillac


Then, to activate that VE to be able to use it and install packages into it:


conda activate cadillac


From there, you can install packages into your new VE using the "conda" command or the "pip" command:


conda install numpy

pip install numpy


Putting it all together, creating a VE called "cadillac" and installing the zlib package into it:


[cousins@katahdin ~]$ module load anaconda3

[cousins@katahdin ~]$ $INIT_CONDA

(base) [cousins@katahdin ~]$ conda create --name cadillac

Collecting package metadata (current_repodata.json): done

Solving environment: done



==> WARNING: A newer version of conda exists. <==

current version: 4.8.3

latest version: 4.12.0


Please update conda by running


$ conda update -n base -c defaults conda




## Package Plan ##


environment location: /home/cousins/.conda/envs/cadillac




Proceed ([y]/n)? y


Preparing transaction: done

Verifying transaction: done

Executing transaction: done

#

# To activate this environment, use

#

# $ conda activate cadillac

#

# To deactivate an active environment, use

#

# $ conda deactivate


(base) [cousins@katahdin ~]$ conda activate cadillac

(cadillac) [cousins@katahdin ~]$ conda install zlib

Collecting package metadata (current_repodata.json): done

Solving environment: done



==> WARNING: A newer version of conda exists. <==

current version: 4.8.3

latest version: 4.12.0


Please update conda by running


$ conda update -n base -c defaults conda




## Package Plan ##


environment location: /home/cousins/.conda/envs/cadillac


added / updated specs:

- zlib



The following NEW packages will be INSTALLED:


_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge

_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu

libgcc-ng conda-forge/linux-64::libgcc-ng-11.2.0-h1d223b6_15

libgomp conda-forge/linux-64::libgomp-11.2.0-h1d223b6_15

libzlib conda-forge/linux-64::libzlib-1.2.11-h166bdaf_1014

zlib conda-forge/linux-64::zlib-1.2.11-h166bdaf_1014



Proceed ([y]/n)? y


Preparing transaction: done

Verifying transaction: done

Executing transaction: done



If you want to use a VE in a cluster job, you just need to put these commands in your SLURM script:


module load anaconda3

$INIT_CONDA

conda activate cadillac

From there, you can run the python command to run your Python script.