Containers
ABCI allows users to create an application execution environment using Singularity containers. This allows users to create their own customized environments or build and compute equivalent environments on ABCI based on container images officially distributed by external organizations.
For example, the NGC Catalog provides container images of various deep learning frameworks, CUDA and HPC environments. See NVIDIA NGCfor tips on how to use the NGC Catalog with ABCI.
You can also download container images on which the latest software is installed from official or verified repositories on Docker Hub. However, be aware not to use untrusted container images. The followings are examples.
Singularity
Singularity is available on the ABCI System.
Available version is SingularityPRO 4.1.
To use Singularity, set up user environment by the module
command.
[username@g0001 ~]$ module load singularitypro
More comprehensive user guide for Singularity will be found:
To run NGC-provided Docker images on ABCI by using Singularity: NVIDIA NGC
Create a Singularity image (pull)
Singularity container image can be stored as a file.
This procedure shows how to create a Singularity image file using pull
.
Example) Create a Singularity image file using pull
[username@es1 ~]$ module load singularitypro
[username@es1 ~]$ export SINGULARITY_TMPDIR=/scratch/$USER
[username@es1 ~]$ singularity pull tensorflow.img docker://tensorflow/tensorflow:latest-gpu
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
...
[username@es1 ~]$ ls tensorflow.img
tensorflow.img
The SINGULARITY_TMPDIR
environment variable specifies the location where temporary files are created when the pull or build commands are executed.
Please refer to the FAQ "I get an error due to insufficient disk space, when I ran the singularity build/pull on the compute node." for more information.
Create a Singularity image (build)
In the SingularityPRO environment of the ABCI system, you can build container image files using fakeroot
option.
Note
In the SingularityPRO environment, you can also build container image file using remote build. See ABCI Singularity Endpoint for more information.
Warning
When using the fakeroot
option, only node-local areas (such as /tmp or $SGE_LOCALDIR) can be specified for the SINGULARITY_TMPDIR
environment variable.
Home area ($HOME), Group area (/groups/$YOUR_GROUP), and global scratch area (/scratch/$USER) cannot be specified.
Example) Create a Singularity image file using build
[username@es1 ~]$ module load singularitypro
[username@es1 ~]$ cat ubuntu.def
Bootstrap: docker
From: ubuntu:20.04
%post
apt-get update
apt-get install -y lsb-release
%runscript
lsb_release -d
[username@es1 ~]$ singularity build --fakeroot ubuntu.sif ubuntu.def
INFO: Starting build...
(snip)
INFO: Creating SIF file...
INFO: Build complete: ubuntu.sif
[username@es1 singularity]$
If the output destination of the image file (ubuntunt.sif) is set to the group area in the above command, an error occurs. In this case, it is possible to avoid the problem by executing the newgrp
command after checking the ownership group of the image destination group area with id
command as follows.
In the example below, gaa00000
is the owning group of the image destination group area.
[username@es1 groupname]$ id -a
uid=0000(aaa00000aa) gid=0000(aaa00000aa) groups=0000(aaa00000aa),00000(gaa00000)
[username@es1 groupname]$ newgrp gaa00000
Running a container with Singularity
When you use Singularity, you need to start Singularity container using singularity run
command in job script.
To run an image file in a container, specify the image file as an argument to the singularity run
command.
You can also use the singularity run
command to run a container image published in Docker Hub.
Example) Run a container with a Singularity image file in an interactive job
[username@es1 ~]$ qrsh -g grpname -l rt_F=1 -l h_rt=1:00:00
[username@g0001 ~]$ module load singularitypro
[username@g0001 ~]$ singularity run --nv ./tensorflow.img
Example) Run a container with a Singularity image file in a batch job
[username@es1 ~]$ cat job.sh
#!/bin/sh
#$-l rt_F=1
#$-j y
source /etc/profile.d/modules.sh
module load singularitypro
singularity run --nv ./tensorflow.img
[username@es1 ~]$ qsub -g grpname job.sh
Example) Run a container image published in Docker Hub
The following sample executes a Singularity container using TensorFlow container image published in Docker Hub.
python3 sample.py
is executed in the container started by singularity run
command.
The container image is downloaded at the first startup and cached in home area.
The second and subsequent times startup is faster by using cached data.
[username@es1 ~]$ qrsh -g grpname -l rt_F=1 -l h_rt=1:00:00
[username@g0001 ~]$ module load singularitypro
[username@g0001 ~]$ export SINGULARITY_TMPDIR=$SGE_LOCALDIR
[username@g0001 ~]$ singularity run --nv docker://tensorflow/tensorflow:latest-gpu
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
You are running this container as user with ID 10000 and group 10000,
which should map to the ID and group for your user on the Docker host. Great!
/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Read-only file system
Singularity> python3 sample.py
Build Singularity image from Dockerfile
On ABCI, you cannot build a Singularity image directly from Dockerfile. If you have only Dockerfile, you have two ways to build a Singularity image on ABCI.
Via Docker Hub
Build a Docker container image from Dockerfile on a system having Docker execution environment, and upload the image to Docker Hub. You can use the Docker container image on ABCI.
Following example shows how to build SSD300 v1.1 image developed by NVIDIA from Dockerfile and upload it to Docker Hub.
[user@pc ~]$ git clone https://github.com/NVIDIA/DeepLearningExamples
[user@pc ~]$ cd DeepLearningExamples/PyTorch/Detection/SSD
[user@pc SSD]$ cat Dockerfile
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:20.06-py3
FROM ${FROM_IMAGE_NAME}
# Set working directory
WORKDIR /workspace
ENV PYTHONPATH "${PYTHONPATH}:/workspace"
COPY requirements.txt .
RUN pip install --no-cache-dir git+https://github.com/NVIDIA/dllogger.git#egg=dllogger
RUN pip install -r requirements.txt
RUN python3 -m pip install pycocotools==2.0.0
# Copy SSD code
COPY ./setup.py .
COPY ./csrc ./csrc
RUN pip install .
COPY . .
[user@pc SSD]$ docker build -t user/docker_name .
[user@pc SSD]$ docker login && docker push user/docker_name
To run the built image on ABCI, please refer to Running a container with Singularity
Convert Dockerfile to Singularity recipe
By converting Dockerfile to Singularity recipe, you can build a Singularity container image which provides the same functionality defined in the Dockerfile on ABCI. You can manually convert Dockerfile, but using Singularity Python helps the conversion.
Warning
The conversion of Singularity Python is not perfect.
If singularity build
fails when the generated Singularity recipe file is used, modify the recipe file manually.
Example procedure for installing Singularity Python)
[username@es1 ~]$ module load python/3.10
[username@es1 ~]$ python3 -m venv work
[username@es1 ~]$ source work/bin/activate
(work) [username@es1 ~]$ pip3 install spython
Following example shows how to convert Dockerfile of SSD300 v1.1 image developed by NVIDIA using Singularity Python and modify the generated Singularity recipe (ssd.def) so that it can correctly generate a Singularity image.
Modifications)
- Files in WORKDIR will not be copied => Set the copy destination to the absolute path of WORKDIR
[username@es1 ~]$ module load python/3.10
[username@es1 ~]$ source work/bin/activate
(work) [username@es1 ~]$ git clone https://github.com/NVIDIA/DeepLearningExamples
(work) [username@es1 ~]$ cd DeepLearningExamples/PyTorch/Detection/SSD
(work) [username@es1 SSD]$ spython recipe Dockerfile ssd.def
(work) [username@es1 SSD]$ cp -p ssd.def ssd_org.def
(work) [username@es1 SSD]$ vi ssd.def
Bootstrap: docker
From: nvcr.io/nvidia/pytorch:22.10-py3
Stage: spython-base
%files
requirements.txt /workspace/ssd/ #<- copy to WORKDIR directory.
. /workspace/ssd/ #<- copy to WORKDIR directory.
%post
FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:22.10-py3
# Set working directory
mkdir -p /workspace/ssd
cd /workspace/ssd
# Copy the model files
# Install python requirements
pip install --no-cache-dir -r requirements.txt
mkdir models #<- #<- Requires to run main.py
CUDNN_V8_API_ENABLED=1
TORCH_CUDNN_V8_API_ENABLED=1
%environment
export CUDNN_V8_API_ENABLED=1
export TORCH_CUDNN_V8_API_ENABLED=1
%runscript
cd /workspace/ssd
exec /bin/bash "$@"
%startscript
cd /workspace/ssd
exec /bin/bash "$@"
To create a Singularity image from the generated recipe file on ABCI, please refer to Create a Singularity image (build).
Examples of Singularity recipe files
This chapter shows examples of Singularity recipe files. See the Singularity user guide for more information about the recipe file.
Including local files in the container image
This is an example of compiling Open MPI and local program files (C language) into a container image. In this case, locate the Singularity recipe file (openmpi.def) and the program file (mpitest.c) in your home directory.
openmpi.def
Bootstrap: docker
From: ubuntu:latest
%files
mpitest.c /opt
%environment
export OMPI_DIR=/opt/ompi
export SINGULARITY_OMPI_DIR=$OMPI_DIR
export SINGULARITYENV_APPEND_PATH=$OMPI_DIR/bin
export SINGULAIRTYENV_APPEND_LD_LIBRARY_PATH=$OMPI_DIR/lib
%post
echo "Installing required packages..."
apt-get update && apt-get install -y wget git bash gcc gfortran g++ make file bzip2
echo "Installing Open MPI"
export OMPI_DIR=/opt/ompi
export OMPI_VERSION=4.1.5
export OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-$OMPI_VERSION.tar.bz2"
mkdir -p /tmp/ompi
mkdir -p /opt
# Download
cd /tmp/ompi && wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL && tar -xjf openmpi-$OMPI_VERSION.tar.bz2
# Compile and install
cd /tmp/ompi/openmpi-$OMPI_VERSION && ./configure --prefix=$OMPI_DIR && make install
# Set env variables so we can compile our application
export PATH=$OMPI_DIR/bin:$PATH
export LD_LIBRARY_PATH=$OMPI_DIR/lib:$LD_LIBRARY_PATH
export MANPATH=$OMPI_DIR/share/man:$MANPATH
echo "Compiling the MPI application..."
cd /opt && mpicc -o mpitest mpitest.c
mpitest.c
#include <mpi.h>
#include <stdio.h>
int main (int argc, char **argv) {
int rc;
int size;
int myrank;
rc = MPI_Init (&argc, &argv);
if (rc != MPI_SUCCESS) {
fprintf (stderr, "MPI_Init() failed\n");
return EXIT_FAILURE;
}
rc = MPI_Comm_size (MPI_COMM_WORLD, &size);
if (rc != MPI_SUCCESS) {
fprintf (stderr, "MPI_Comm_size() failed\n");
goto exit_with_error;
}
rc = MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
if (rc != MPI_SUCCESS) {
fprintf (stderr, "MPI_Comm_rank() failed\n");
goto exit_with_error;
}
fprintf (stdout, "Hello, I am rank %d/%d\n", myrank, size);
MPI_Finalize();
return EXIT_SUCCESS;
exit_with_error:
MPI_Finalize();
return EXIT_FAILURE;
}
Use singularity
command to build the container image. If successful, a container image (openmpi.sif) is generated.
[username@es1 ~]$ qrsh -g grpname -l rt_G.small=1
[username@g0001 ~]$ module load singularitypro
[username@g0001 ~]$ singularity build --fakeroot openmpi.sif openmpi.def
INFO: Starting build...
Getting image source signatures
(snip)
INFO: Adding environment to container
INFO: Creating SIF file...
INFO: Build complete: openmpi.sif
[username@g0001 ~]$
Example) running the container
[username@g0001 ~]$ module load singularitypro hpcx/2.12
[username@g0001 ~]$ mpirun -hostfile $SGE_JOB_HOSTLIST -np 4 -map-by node singularity exec --env OPAL_PREFIX=/opt/ompi --env PMIX_INSTALL_PREFIX=/opt/ompi openmpi.sif /opt/mpitest
Hello, I am rank 2/4
Hello, I am rank 3/4
Hello, I am rank 0/4
Hello, I am rank 1/4
Using the CUDA Toolkit
This is an example of running python on h2o4gpu with the CUDA Toolkit. In this case, you will have a Singularity recipe file (h2o4gpuPy.def) and a validation script (h2o4gpu_sample.py) in your home directory.
h2o4gpuPy.def
BootStrap: docker
From: nvidia/cuda:10.2-devel-ubuntu18.04
# Note: This container will have only the Python API enabled
%environment
# -----------------------------------------------------------------------------------
export PYTHON_VERSION=3.6
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LC_ALL=C
%post
# -----------------------------------------------------------------------------------
# this will install all necessary packages and prepare the contianer
export PYTHON_VERSION=3.6
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64
echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
apt-get -y update && apt-get install -y --no-install-recommends \
build-essential \
git \
curl \
vim \
wget \
ca-certificates \
libjpeg-dev \
libpng-dev \
libpython3.6-dev \
libopenblas-dev pbzip2 \
libcurl4-openssl-dev libssl-dev libxml2-dev
ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
wget https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/stable/ai/h2o/h2o4gpu/0.4-cuda10/rel-0.4.0/h2o4gpu-0.4.0-cp36-cp36m-linux_x86_64.whl
pip install h2o4gpu-0.4.0-cp36-cp36m-linux_x86_64.whl
h2o4gpu_sample.py
import h2o4gpu
import numpy as np
X = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
print(model.cluster_centers_)
Use singularity
command to build the container image. If successful, a container image (h2o4gpuPy.sif) is generated.
[username@es1 ~]$ qrsh -g grpname -l rt_G.small=1
[username@g0001 ~]$ module load singularitypro
[username@g0001 ~]$ singularity build --fakeroot h2o4gpuPy.sif h2o4gpuPy.def
INFO: Starting build...
Getting image source signatures
(snip)
INFO: Adding environment to container
INFO: Creating SIF file...
INFO: Build complete: h2o4gpuPy.sif
[username@g0001 ~]$
Example) running the container
[username@g0001 ~]$ module load singularitypro cuda/10.2
[username@g0001 ~]$ singularity exec --nv h2o4gpuPy.sif python3 h2o4gpu_sample.py
[[1. 0.5]
[1. 4. ]]
[username@g0001 ~]$