databricks-cli icon indicating copy to clipboard operation
databricks-cli copied to clipboard

Python Wheel Install Command

Open nfx opened this issue 5 years ago • 3 comments

Allows installation of wheels onto databricks clusters by using standard Python setuptools framework - through setup.py distutils.command entry point. E.g.

python setup.py databricks_install --cluster-id abcd --databricks-cli-profile staging

will do the following automatically:

  1. build wheel
  2. use staging profile from CLI or throw error with instructions to configre
  3. upload it to DBFS location (configurable as well)
  4. install it on cluster abcd as whl library

TODO:

  1. wait until library is successfully installed or throw error
  2. install library on cluster by name
  3. install library on clusters by tag (e.g. team tags)

References: https://setuptools.readthedocs.io/en/latest/setuptools.html#adding-commands https://books.google.nl/books?id=9G9zX_jf1f8C&pg=PT236&lpg=PT236&dq=entry_points+distutils.commands&source=bl&ots=_4deWhAJIf&sig=ACfU3U0LgNjqMdOVTc2zNbdwpWlNr43xkg&hl=en&sa=X&ved=2ahUKEwjzl9TFnsjoAhVNDewKHSlkDUgQ6AEwA3oECAsQKA#v=onepage&q=entry_points%20distutils.commands&f=false

nfx avatar Apr 01 '20 22:04 nfx

Codecov Report

Merging #286 into master will decrease coverage by 0.08%. The diff coverage is 78.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   83.53%   83.45%   -0.09%     
==========================================
  Files          33       34       +1     
  Lines        2211     2266      +55     
==========================================
+ Hits         1847     1891      +44     
- Misses        364      375      +11     
Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
databricks_cli/libraries/distutils.py 77.77% <77.77%> (ø)
databricks_cli/utils.py 98.00% <100.00%> (+2.08%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f92a8c3...4a85751. Read the comment docs.

codecov-io avatar Apr 02 '20 08:04 codecov-io

Once this functionality is ready, we can change our https://docs.databricks.com/dev-tools/ci-cd.html doc from

stage('Package') {
    sh """#!/bin/bash

          # Enable Conda environment for tests
          source ${CONDAPATH}/bin/activate ${CONDAENV}

          # Package Python library to wheel
          cd ${LIBRARYPATH}/python/dbxdemo
          python3 setup.py sdist bdist_wheel
       """
  }
  stage('Build Artifact') {
    sh """mkdir -p ${BUILDPATH}/Workspace
          mkdir -p ${BUILDPATH}/Libraries/python
          mkdir -p ${BUILDPATH}/Validation/Output
          #Get modified files
          git diff --name-only --diff-filter=AMR HEAD^1 HEAD | xargs -I '{}' cp --parents -r '{}' ${BUILDPATH}

          # Get packaged libs
          find ${LIBRARYPATH} -name '*.whl' | xargs -I '{}' cp '{}' ${BUILDPATH}/Libraries/python/

          # Generate artifact
          tar -czvf Builds/latest_build.tar.gz ${BUILDPATH}
       """
    archiveArtifacts artifacts: 'Builds/latest_build.tar.gz'
  }
  stage('Deploy') {
    sh """#!/bin/bash
          # Enable Conda environment for tests
          source ${CONDAPATH}/bin/activate ${CONDAENV}

          # Use Databricks CLI to deploy notebooks
          databricks workspace import_dir ${BUILDPATH}/Workspace ${WORKSPACEPATH}

          dbfs cp -r ${BUILDPATH}/Libraries/python ${DBFSPATH}
       """
    withCredentials([string(credentialsId: DBTOKEN, variable: 'TOKEN')]) {
        sh """#!/bin/bash

              #Get space delimited list of libraries
              LIBS=\$(find ${BUILDPATH}/Libraries/python/ -name '*.whl' | sed 's#.*/##' | paste -sd " ")

              #Script to uninstall, reboot if needed & instsall library
              python3 ${SCRIPTPATH}/installWhlLibrary.py --workspace=${DBURL}\
                        --token=$TOKEN\
                        --clusterid=${CLUSTERID}\
                        --libs=\$LIBS\
                        --dbfspath=${DBFSPATH}
           """
    }
  }

to python3 setup.py sdist databricks_install $CLUSTERID.

As mentioned by @ccstevens, this is for cluster-wide libraries and not for notebook-scoped libraries. CI/CD processes don't make sense for latter.

nfx avatar Apr 03 '20 17:04 nfx

@nfx Do you want to merge this, or functionality similar to this, or should I take the request for review as an FYI?

pietern avatar May 09 '22 12:05 pietern