dag: Support dynamic git clone for sub-workflows
Add support for executing sub-workflows directly from a GitHub repository by performing a full git clone. This will allow users to reference workflows stored in GitHub and execute them without manual downloads.
Example:
steps:
- name: run sub-dag from GitHub
run:
repo: "https://github.com/owner/repo.git" # Git URL
ref: "main" # Optional branch, tag, or commit SHA
path: "path/to/dag.yaml" # Path to the DAG file in the repo
token: "${GITHUB_TOKEN}" # Optional, for private repositories
@yottahmd hi , can i work on this issue ?
Hi @halalala222, thank you! I think the design of this functionality still lacks clarity; for example, where the file should be cloned, or how we should handle other files in the repository. I'm not sure if this task is ready to be implemented just yet. What do you think?
Hi @halalala222 . I think, can the Sparse Checkout feature be used to pull the specified files? For example :
-
git config core.sparsecheckout trueto enable Sparse Checkout. -
write the corresponding file paths into the
.git/info/sparse-checkoutfile -
git pull origin main -
git check out main -
ls
What do you think?
Thank you for the great idea! That could definitely be a viable solution.
However, I’d prefer to avoid requiring users to configure their global Git settings just to clone a repo in a DAG, if possible.
I'm considering implementing a simplified version of the checkout action as a git-checkout executor as a first step before implementing the original functionality stated in this issue.
I’ve drafted a design document with ChatGPT (apologies, it’s a bit long). I believe we could begin by implementing a very minimal version of this.
What do you think?
Git Checkout Executor Design Document
Overview
This document proposes the addition of a new executor type for Dagu called git-checkout, which supports checking out Git repositories as a first-class workflow step, similar to GitHub Actions' actions/checkout.
Motivation
Many workflows begin with fetching source code from Git. Currently, users must implement custom shell steps for this purpose, which leads to repetitive, error-prone DAG definitions. By introducing a native executor, we:
- Improve UX through clean and declarative YAML
- Reuse cached repositories efficiently
- Handle authentication securely
- Enhance interoperability with future artifact and guardrail features
YAML Specification
steps:
- name: fetch source
executor:
type: git-checkout
config:
repo: dagu-org/dagu # <owner>/<repo> or full URL
ref: v1.8.0 # branch, tag, or SHA
path: ./workspace # relative or absolute path
depth: 1 # shallow clone
submodules: false # initialize submodules?
lfs: false # fetch LFS objects?
clean: true # clean working copy before use
cache: true # cache mirror in ~/.cache/dagu
auth:
method: token # token | ssh | none
tokenEnv: GITHUB_TOKEN # environment variable for token
sshKey: ~/.ssh/id_rsa # SSH key path if method=ssh
output: WORKSPACE # exposed variable
Execution Semantics
Mirror Initialization
- Cache mirror to
~/.cache/dagu/git/mirrors/github.com/<owner>/<repo>.git - Use
flockto prevent concurrent corruption
Working Copy
- If
pathexists andclean: true: rungit clean -ffdandgit reset --hard - Otherwise, clone with
--referencefrom mirror - Checkout
ref(or--detachif SHA) - If enabled:
git submodule updateandgit lfs pull
Output
- Set the absolute path of the working directory as the step's output variable
Implementation Plan
Config Struct
type GitCheckoutConfig struct {
Repo string
Ref string
Path string
Depth int
Submodules bool
LFS bool
Clean bool
Cache bool
Auth struct {
Method string
TokenEnv string
SSHKey string
}
}
Executor Skeleton
type gitCheckoutExecutor struct {
cfg GitCheckoutConfig
}
func (e *gitCheckoutExecutor) Run(ctx context.Context, stepCtx dagu.ExecutorContext) error {
// Resolve URL and mirror path
// Clone or reuse mirror
// Create or clean working copy
// Checkout, init submodules, pull LFS if needed
// stepCtx.SetOutput("WORKSPACE", absPath)
return nil
}
Auth Handling
- Token: embed in HTTPS URL
- SSH: configure
GIT_SSH_COMMAND - No secrets are logged or passed in CLI args
Integration with Dagu Features
| Feature | Integration |
|---|---|
| Artifacts | Optionally mark path as artifact on step success |
| Guardrails | Future: checksum validation before use |
| Local-first | Offline reuse via mirror cache |
| UI/Logs | Show clone progress, repo and ref summary |
Security Considerations
- Use environment variables for secrets
- Block
..inpathto prevent directory traversal - Isolate each checkout within DAG sandbox if needed
Future Extensions
- Bitbucket, GitLab support
- Sparse checkouts
- Release asset downloads
- Patch application support
@yottahmd Hi! I think this is a really great idea!! I will try to implement a minimum version of it.
Thank you, @halalala222 ! This feature will be extremely useful. Please feel free to reach out if you have any questions. By the way, do you know of any good Go libraries for integrating with Git repositories?
@yottahmd !Hi! go-git Does this repository count?
Thanks @halalala222, yes, go-git looks like a great choice 👍