easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

WIP: 20200602 dependency graph layering

Open jotelha opened this issue 4 years ago • 4 comments

This is what has been briefly discussed on the EasyBuld #containers slack channel last week.

As noted in the documentation at https://easybuild.readthedocs.io/en/latest/Containers.html#stacking-container-images, creating 'stacked' containers avoids redundant builds. For my own purposes (I wanted to migrate a workflow involving quite a few scientific software packages from an EasyBuild-maintained site to another that allows running Singularity containers), I put together some graph partitioning and layering functionality that arranges the dependencies of a certain set of target easyconfigs in layers in order to facilitate automatic creation of stacked container images. Those containers avoid as many redundant builds as possible and aim at being reusable for further container builds as well. The attached hoermann_eb_dep_graph_partitioning_and_layering.pdf illustrates without too many words what this code does.

Here a few more comments.

  • I use the pygraph package (https://github.com/Shoobx/python-graph) with it's last commit 2018 as it's already been in there, thus no other dependencies added. It serves its purpose well, but other packages might be able to take over some of the work done manually here (such as the transitive reduction, implemented following https://github.com/networkx/networkx/blob/9aedc31d291ac11eb0bb374c1ce8ad5cbcce02d3/networkx/algorithms/dag.py#L581)
  • I am not a graph theory specialist and the partitioning strategy implemented in dep_graph_partition is far from being based on any profound publication. It's just something made up from scratch that I found would handle more than two targets in some not-too-dumb way, but it might be wise to exchange that for some established strategy at some point.
  • For the same reason, there might be inefficient and redundant operations in there. Especially, the (partial) graph copying loops.
  • No special treatment for build dependencies.
  • If loglevel is DEBUG, then .dot files are written for (sub-) graphs that arise at different points.

In short, a call

eb MUMPS-5.2.1-foss-2020a-metis.eb GROMACS-2020.1-foss-2020a-Python-3.8.2.eb LAMMPS-3Mar2020-foss-2020a-Python-3.8.2-kokkos.eb --dep-graph-layers -r --terse

will result in such output

M4-1.4.18.eb
Bison-3.3.2.eb help2man-1.47.4.eb
Bison-3.5.3.eb flex-2.6.4.eb zlib-1.2.11.eb
binutils-2.34.eb
GCCcore-9.3.0.eb
M4-1.4.18-GCCcore-9.3.0.eb
Bison-3.5.3-GCCcore-9.3.0.eb help2man-1.47.12-GCCcore-9.3.0.eb
flex-2.6.4-GCCcore-9.3.0.eb zlib-1.2.11-GCCcore-9.3.0.eb
binutils-2.34-GCCcore-9.3.0.eb
ncurses-6.2-GCCcore-9.3.0.eb
expat-2.2.9-GCCcore-9.3.0.eb libreadline-8.0-GCCcore-9.3.0.eb
Perl-5.30.2-GCCcore-9.3.0.eb
Autoconf-2.69-GCCcore-9.3.0.eb
Automake-1.16.1-GCCcore-9.3.0.eb libtool-2.4.6-GCCcore-9.3.0.eb ncurses-6.1.eb
Autotools-20180311-GCCcore-9.3.0.eb gettext-0.20.1.eb
xorg-macros-1.19.2-GCCcore-9.3.0.eb XZ-5.2.5-GCCcore-9.3.0.eb
libpciaccess-0.16-GCCcore-9.3.0.eb libxml2-2.9.10-GCCcore-9.3.0.eb numactl-2.0.13-GCCcore-9.3.0.eb pkg-config-0.29.2-GCCcore-9.3.0.eb
GCC-9.3.0.eb hwloc-2.2.0-GCCcore-9.3.0.eb UCX-1.8.0-GCCcore-9.3.0.eb
OpenMPI-4.0.3-GCC-9.3.0.eb
gompi-2020a.eb OpenBLAS-0.3.9-GCC-9.3.0.eb
bzip2-1.0.8-GCCcore-9.3.0.eb cURL-7.69.1-GCCcore-9.3.0.eb FFTW-3.3.8-gompi-2020a.eb ScaLAPACK-2.1.0-gompi-2020a.eb
CMake-3.16.4-GCCcore-9.3.0.eb foss-2020a.eb
METIS-5.1.0-GCCcore-9.3.0.eb SCOTCH-6.0.9-gompi-2020a.eb
MUMPS-5.2.1-foss-2020a-metis.eb

M4-1.4.18.eb
Bison-3.3.2.eb help2man-1.47.4.eb
Bison-3.5.3.eb flex-2.6.4.eb zlib-1.2.11.eb
binutils-2.34.eb
GCCcore-9.3.0.eb
M4-1.4.18-GCCcore-9.3.0.eb
Bison-3.5.3-GCCcore-9.3.0.eb help2man-1.47.12-GCCcore-9.3.0.eb
flex-2.6.4-GCCcore-9.3.0.eb zlib-1.2.11-GCCcore-9.3.0.eb
binutils-2.34-GCCcore-9.3.0.eb
ncurses-6.2-GCCcore-9.3.0.eb
expat-2.2.9-GCCcore-9.3.0.eb libreadline-8.0-GCCcore-9.3.0.eb
Perl-5.30.2-GCCcore-9.3.0.eb
Autoconf-2.69-GCCcore-9.3.0.eb
Automake-1.16.1-GCCcore-9.3.0.eb libtool-2.4.6-GCCcore-9.3.0.eb ncurses-6.1.eb
Autotools-20180311-GCCcore-9.3.0.eb gettext-0.20.1.eb
xorg-macros-1.19.2-GCCcore-9.3.0.eb XZ-5.2.5-GCCcore-9.3.0.eb
libpciaccess-0.16-GCCcore-9.3.0.eb libxml2-2.9.10-GCCcore-9.3.0.eb numactl-2.0.13-GCCcore-9.3.0.eb pkg-config-0.29.2-GCCcore-9.3.0.eb
GCC-9.3.0.eb hwloc-2.2.0-GCCcore-9.3.0.eb UCX-1.8.0-GCCcore-9.3.0.eb
OpenMPI-4.0.3-GCC-9.3.0.eb
gompi-2020a.eb OpenBLAS-0.3.9-GCC-9.3.0.eb
bzip2-1.0.8-GCCcore-9.3.0.eb cURL-7.69.1-GCCcore-9.3.0.eb FFTW-3.3.8-gompi-2020a.eb ScaLAPACK-2.1.0-gompi-2020a.eb
CMake-3.16.4-GCCcore-9.3.0.eb foss-2020a.eb
Tcl-8.6.10-GCCcore-9.3.0.eb
GMP-6.2.0-GCCcore-9.3.0.eb libffi-3.3-GCCcore-9.3.0.eb SQLite-3.31.1-GCCcore-9.3.0.eb
Eigen-3.3.7-GCCcore-9.3.0.eb Python-3.8.2-GCCcore-9.3.0.eb
pybind11-2.4.3-GCCcore-9.3.0-Python-3.8.2.eb
SciPy-bundle-2020.03-foss-2020a-Python-3.8.2.eb
networkx-2.4-foss-2020a-Python-3.8.2.eb scikit-build-0.10.0-foss-2020a-Python-3.8.2.eb
GROMACS-2020.1-foss-2020a-Python-3.8.2.eb

M4-1.4.18.eb
Bison-3.3.2.eb help2man-1.47.4.eb
Bison-3.5.3.eb flex-2.6.4.eb zlib-1.2.11.eb
binutils-2.34.eb
GCCcore-9.3.0.eb
M4-1.4.18-GCCcore-9.3.0.eb
Bison-3.5.3-GCCcore-9.3.0.eb help2man-1.47.12-GCCcore-9.3.0.eb
flex-2.6.4-GCCcore-9.3.0.eb zlib-1.2.11-GCCcore-9.3.0.eb
binutils-2.34-GCCcore-9.3.0.eb
ncurses-6.2-GCCcore-9.3.0.eb
expat-2.2.9-GCCcore-9.3.0.eb libreadline-8.0-GCCcore-9.3.0.eb
Perl-5.30.2-GCCcore-9.3.0.eb
Autoconf-2.69-GCCcore-9.3.0.eb
Automake-1.16.1-GCCcore-9.3.0.eb libtool-2.4.6-GCCcore-9.3.0.eb ncurses-6.1.eb
Autotools-20180311-GCCcore-9.3.0.eb gettext-0.20.1.eb
xorg-macros-1.19.2-GCCcore-9.3.0.eb XZ-5.2.5-GCCcore-9.3.0.eb
libpciaccess-0.16-GCCcore-9.3.0.eb libxml2-2.9.10-GCCcore-9.3.0.eb numactl-2.0.13-GCCcore-9.3.0.eb pkg-config-0.29.2-GCCcore-9.3.0.eb
GCC-9.3.0.eb hwloc-2.2.0-GCCcore-9.3.0.eb UCX-1.8.0-GCCcore-9.3.0.eb
OpenMPI-4.0.3-GCC-9.3.0.eb
gompi-2020a.eb OpenBLAS-0.3.9-GCC-9.3.0.eb
bzip2-1.0.8-GCCcore-9.3.0.eb cURL-7.69.1-GCCcore-9.3.0.eb FFTW-3.3.8-gompi-2020a.eb ScaLAPACK-2.1.0-gompi-2020a.eb
CMake-3.16.4-GCCcore-9.3.0.eb foss-2020a.eb
Tcl-8.6.10-GCCcore-9.3.0.eb
GMP-6.2.0-GCCcore-9.3.0.eb libffi-3.3-GCCcore-9.3.0.eb SQLite-3.31.1-GCCcore-9.3.0.eb
Eigen-3.3.7-GCCcore-9.3.0.eb Python-3.8.2-GCCcore-9.3.0.eb
pybind11-2.4.3-GCCcore-9.3.0-Python-3.8.2.eb
SciPy-bundle-2020.03-foss-2020a-Python-3.8.2.eb
libpng-1.6.37-GCCcore-9.3.0.eb
freetype-2.10.1-GCCcore-9.3.0.eb gperf-3.1-GCCcore-9.3.0.eb Ninja-1.10.0-GCCcore-9.3.0.eb util-linux-2.35-GCCcore-9.3.0.eb
fontconfig-2.13.92-GCCcore-9.3.0.eb gettext-0.20.1-GCCcore-9.3.0.eb intltool-0.51.0-GCCcore-9.3.0.eb Meson-0.53.2-GCCcore-9.3.0-Python-3.8.2.eb
X11-20200222-GCCcore-9.3.0.eb
gzip-1.10-GCCcore-9.3.0.eb lz4-1.9.2-GCCcore-9.3.0.eb Python-2.7.18-GCCcore-9.3.0.eb Tk-8.6.10-GCCcore-9.3.0.eb
gc-7.6.12-GCCcore-9.3.0.eb libdrm-2.4.100-GCCcore-9.3.0.eb libglvnd-1.2.0-GCCcore-9.3.0.eb libunistring-0.9.10-GCCcore-9.3.0.eb libunwind-1.3.1-GCCcore-9.3.0.eb LLVM-9.0.1-GCCcore-9.3.0.eb Mako-1.1.2-GCCcore-9.3.0.eb Szip-2.1.1-GCCcore-9.3.0.eb Tkinter-3.8.2-GCCcore-9.3.0.eb zstd-1.4.4-GCCcore-9.3.0.eb
Doxygen-1.8.17-GCCcore-9.3.0.eb Guile-1.8.8-GCCcore-9.3.0.eb HDF5-1.10.6-gompi-2020a.eb matplotlib-3.2.1-foss-2020a-Python-3.8.2.eb Mesa-20.0.2-GCCcore-9.3.0.eb NASM-2.14.02-GCCcore-9.3.0.eb pkgconfig-1.5.1-GCCcore-9.3.0-Python-3.8.2.eb Yasm-1.3.0-GCCcore-9.3.0.eb
Boost-1.72.0-gompi-2020a.eb FriBidi-1.0.9-GCCcore-9.3.0.eb GSL-2.6-GCC-9.3.0.eb h5py-2.10.0-foss-2020a-Python-3.8.2.eb LAME-3.100-GCCcore-9.3.0.eb libGLU-9.0.1-GCCcore-9.3.0.eb libmatheval-1.1.11-GCCcore-9.3.0.eb molmod-1.4.5-foss-2020a-Python-3.8.2.eb netCDF-4.7.4-gompi-2020a.eb x264-20191217-GCCcore-9.3.0.eb x265-3.3-GCCcore-9.3.0.eb
archspec-0.1.0-GCCcore-9.3.0-Python-3.8.2.eb FFmpeg-4.2.2-GCCcore-9.3.0.eb kim-api-2.1.3-foss-2020a.eb libjpeg-turbo-2.0.4-GCCcore-9.3.0.eb PCRE-8.44-GCCcore-9.3.0.eb PLUMED-2.6.0-foss-2020a-Python-3.8.2.eb ScaFaCoS-1.0.1-foss-2020a.eb tbb-2020.1-GCCcore-9.3.0.eb Voro++-0.4.6-GCCcore-9.3.0.eb VTK-8.2.0-foss-2020a-Python-3.8.2.eb yaff-1.6.0-foss-2020a-Python-3.8.2.eb
LAMMPS-3Mar2020-foss-2020a-Python-3.8.2-kokkos.eb

where each line represents one layer (from lower to upper) and each paragraph one target. Hopefully, as many lower layers agree for as many targets as possible.

This bash script builds stacked container images, as an example on what can be done with such layering:

#!/bin/bash
set -euo pipefail

bootstrap_image="shahzebmsiddiqui/default/easybuild:centos-7"

declare -A levels=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3)
LOG_LEVEL="WARN"
DRY_RUN=

log_msg() {
    local log_priority=$1
    local log_message=$2

    #check if level exists
    [[ ${levels[$log_priority]} ]] || return 1

    #check if level is enough
    if (( ${levels[$log_priority]} >= ${levels[$LOG_LEVEL]} )); then
        echo "${log_priority} : ${log_message}"
    fi
}

usage() {
  echo -n "
Usage: $(basename "$0") [-dhnv] [--image BOOTSTRAP_IMAGE] [EASY_CONFIG [EASY_CONFIG [ ... ]]]

Build stacked container images in current working directory from BOOTSTRAP_IMAGE (default: ${bootstrap_image}).

Expects environment variable SINGULARITY_TMPDIR to be set.
"
}

function join_by { local IFS="$1"; shift; echo "$*"; }

args=$(getopt -n "$0" -l "help,verbose,debug,dry-run" -o "hvdn" -- "$@")
if [ $? != 0 ] ; then echo "Failed parsing options." >&2 ; exit 1 ; fi
eval set -- "$args"

while true; do
  case "$1" in
    -h | --help ) usage ; exit 0 ;;
    --image) bootstrap_image=$2; shift; shift;;
    -n | --dry-run ) DRY_RUN=true; shift;;
    -v | --verbose ) LOG_LEVEL=INFO; shift ;;
    -d | --debug ) LOG_LEVEL=DEBUG; shift ;;
    -- ) shift; break ;;
    * ) break ;;
  esac
done

# positional arguments
EASY_CONFIGS=$@

mkdir -p "$(pwd)/sources"
mkdir -p /tmp/easybuild/
ln -sf "$(pwd)/sources" /tmp/easybuild/sources

# always cap concatenated recipe and image names at maximum file name length - reserved length
NAME_MAX=$(getconf NAME_MAX .)
RESERVED_LENGTH=32
MAX_NAME_LENGTH=$(( ${NAME_MAX} - ${RESERVED_LENGTH} ))
log_msg INFO "system max filename length: ${NAME_MAX}"
log_msg INFO "derived max name length: ${MAX_NAME_LENGTH}"

# print some informations
if (( ${levels[$LOG_LEVEL]} <= ${levels["INFO"]} )); then
    for ec in ${EASY_CONFIGS[@]}; do
        cmd="eb ${ec} -Dr"
        log_msg INFO "exec: ${cmd}"
        ${cmd}
    done
    cmd="eb ${EASY_CONFIGS[@]} --dep-graph-layers -r --debug"
    log_msg INFO "exec: ${cmd}"
    ${cmd}
fi

# even with 'terse', eb prints log lines prefixed with '=='
cmd="eb ${EASY_CONFIGS[@]} --dep-graph-layers -r --terse"
log_msg INFO "exec: ${cmd}"
${cmd} | grep -v '==' > eb_layer_lists.txt

previous_layer=
while IFS= read -r layer; do
    if [ -n "${layer}" ]; then
        IFS=' ' read -r -a ecs <<< "$layer"
        log_msg INFO "layer: ${layer}"

        ec_basenames=$(for ec in "${ecs[@]}"; do basename "$ec" ".eb"; done)

        image_name="$(join_by _ ${ec_basenames[@]})"
        log_msg INFO "full name: ${image_name}"

        if (( ${#image_name} > ${MAX_NAME_LENGTH} )); then
            image_name="${image_name:0:${MAX_NAME_LENGTH}}"
            log_msg INFO "capped name: ${image_name}"
        fi

        image_file="${image_name}.sif"
        log_msg INFO "image: ${image_file}"

        if [ -f "${image_file}" ]; then
            log_msg INFO "skipped: '${image_file}' exists already."
        else
            cmd="eb ${layer[@]} --fetch --sourcepath /tmp/easybuild/sources"
            log_msg INFO "exec: ${cmd}"
            if [ -z "${DRY_RUN}" ]; then ${cmd}; fi

            # if previous layer empty, then we are at the beginning of the dependency chain, build new image
            if [ -z "${previous_layer}" ]; then
                cmd="eb -C --container-build-image ${ecs[@]} --containerpath $(pwd) \
                    --container-config bootstrap=library,from=${bootstrap_image},eb_args='-l' \
                    --experimental --force --container-image-name ${image_name} \
                    --container-image-format sif --container-tmpdir ${SINGULARITY_TMPDIR}"
                log_msg INFO "exec: ${cmd}"
                if [ -z "${DRY_RUN}" ]; then ${cmd}; fi
            else
                IFS=' ' read -r -a previous_ecs <<< "$previous_layer"
                previous_ec_basenames=$(for ec in "${previous_ecs[@]}"; do basename "$ec" ".eb"; done)

                previous_image_name="$(join_by _ ${previous_ec_basenames[@]})"
                log_msg INFO "full previous name: ${previous_image_name}"

                if (( ${#previous_image_name} > ${MAX_NAME_LENGTH} )); then
                    previous_image_name="${previous_image_name:0:${MAX_NAME_LENGTH}}"
                    log_msg INFO "capped previous name: ${previous_image_name}"
                fi

                previous_image_file="${previous_image_name}.sif"
                log_msg INFO "previous image: ${previous_image_file}"

                cmd="eb -C --container-build-image ${ecs[@]} --containerpath $(pwd) \
                    --container-config bootstrap=localimage,from=${previous_image_file},eb_args='-l' \
                    --experimental --force --container-image-name ${image_name} \
                    --container-image-format sif --container-tmpdir ${SINGULARITY_TMPDIR}"
                log_msg INFO "exec: ${cmd}"
                if [ -z "${DRY_RUN}" ]; then ${cmd}; fi
            fi
        fi
    else
        log_msg INFO "Reached target ${previous_layer}."
    fi
    previous_layer=${layer}
done < eb_layer_lists.txt

jotelha avatar Jul 04 '20 00:07 jotelha

@jotelha I will try to take a look at this soon...but it is a big PR so it might take me a bit of time

ocaisa avatar Jul 19 '20 19:07 ocaisa

@boegel before you actually add that to a release, I will probably have to rewrite the dep_graph_partition function to be based on some "proper" partitioning algorithm, as mentioned above (and add some tests). I won't have time to work on that until the second half of August, thus @ocaisa please take your time with reviewing the current state.

jotelha avatar Jul 26 '20 18:07 jotelha

@jotelha I think it makes sense to add tests first before we take a closer look at this.

Some refactoring may be needed after review, but having tests will help us to make sense of it I think (which doesn't mean it's complicated code or anything, I haven't taken a close look at it yet).

Maybe even setting up a call to discuss this makes sense, to tackle this efficiently?

boegel avatar Sep 30 '20 07:09 boegel

Sure, I did not manage to come back to that pull request yet, but I hope I will manage to add some tests soon, and I think a call would make sense after that. I would try to get back to you with that say within two weeks.

jotelha avatar Sep 30 '20 21:09 jotelha