Databricks Labs
Databricks Labs
cicd-templates
Manage your Databricks deployments and CI with code.
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and se...
dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation
overwatch
Capture deep metrics on one or all assets within a Databricks workspace
jupyterlab-integration
DEPRECATED: Integrating Jupyter with Databricks via SSH
geoscan
Geospatial clustering at massive scale
dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POC...
databricks-sync
An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.