Customizing Nvidia Containers
Nvidia GPU-optimized Container Catalog
Clicking on the TensorFlow container will give a bunch of information on the container, including how to use it:
The path to the container can be found by clicking on the "Copy Image Path" button and choosing one of the versions of the container. This will be used when we need to create a Definition file. As an example, the most recent one for TensorFlow 2 that I find points to: nvcr.io/nvidia/tensorflow:22.07-tf2-py3 and this will be put into the new Definition file on the second line that starts with "From: "
Creating the Definition file with the Nvidia Container path:
On Katahdin, open a text editor and create a new file called "new_container.def with the following contents, where the second line includes the path that you got from the NGC site in the previous step:
The red line is what is being added to the container. You can add other things to the container using the "apt" command, other pip commands, and other ways too.
Save the file and then run the following commands to actually create the new container. The end result will be a file with a ".simg" extension that is placed in your home directory. The process will start by sshing to a system that has Nvidia GPUs. This might not exactly be necessary but it also takes the load off of Katahdin.
The third line sets up the TMPDIR variable for the singularity command to use. This is done for a couple of reasons but the biggest benefit is that the XDG_RUNTIME_DIR variable points to a tmpfs volume that gets created when you ssh to the grtx-1 system. This tmpfs volume is located in RAM so it is very fast. So, by setting TMPDIR to this directory in RAM, it will speed up the process of creating the container tremendously.
The singularity command runs the "build" subcommand to build the .simg file and it uses the .def file to know how to build it. The "--fakeroot" parameter is needed in order for regular, not-root, accounts to be able to build the container.
Once the container has been created, you can use the container in a Slurm job with the following in your job submission script:
where "my_python_script.py" is the name of your python script.