dagu icon indicating copy to clipboard operation
dagu copied to clipboard

dag: Support dynamic git clone for sub-workflows

Open yottahmd opened this issue 11 months ago • 8 comments

Add support for executing sub-workflows directly from a GitHub repository by performing a full git clone. This will allow users to reference workflows stored in GitHub and execute them without manual downloads.

Example:

steps:
  - name: run sub-dag from GitHub
    run:
      repo: "https://github.com/owner/repo.git"  # Git URL
      ref: "main"                 # Optional branch, tag, or commit SHA
      path: "path/to/dag.yaml"    # Path to the DAG file in the repo
      token: "${GITHUB_TOKEN}"    # Optional, for private repositories

yottahmd avatar Dec 30 '24 15:12 yottahmd

@yottahmd hi , can i work on this issue ?

liooooo29 avatar Apr 23 '25 11:04 liooooo29

Hi @halalala222, thank you! I think the design of this functionality still lacks clarity; for example, where the file should be cloned, or how we should handle other files in the repository. I'm not sure if this task is ready to be implemented just yet. What do you think?

yottahmd avatar Apr 23 '25 13:04 yottahmd

Hi @halalala222 . I think, can the Sparse Checkout feature be used to pull the specified files? For example :

  1. git config core.sparsecheckout true to enable Sparse Checkout. Image

  2. write the corresponding file paths into the .git/info/sparse-checkout file Image

  3. git pull origin main Image

  4. git check out main Image

  5. ls Image

What do you think?

liooooo29 avatar Apr 24 '25 02:04 liooooo29

Thank you for the great idea! That could definitely be a viable solution.

However, I’d prefer to avoid requiring users to configure their global Git settings just to clone a repo in a DAG, if possible.

I'm considering implementing a simplified version of the checkout action as a git-checkout executor as a first step before implementing the original functionality stated in this issue.

I’ve drafted a design document with ChatGPT (apologies, it’s a bit long). I believe we could begin by implementing a very minimal version of this.

What do you think?


Git Checkout Executor Design Document

Overview

This document proposes the addition of a new executor type for Dagu called git-checkout, which supports checking out Git repositories as a first-class workflow step, similar to GitHub Actions' actions/checkout.

Motivation

Many workflows begin with fetching source code from Git. Currently, users must implement custom shell steps for this purpose, which leads to repetitive, error-prone DAG definitions. By introducing a native executor, we:

  • Improve UX through clean and declarative YAML
  • Reuse cached repositories efficiently
  • Handle authentication securely
  • Enhance interoperability with future artifact and guardrail features

YAML Specification

steps:
  - name: fetch source
    executor:
      type: git-checkout
      config:
        repo: dagu-org/dagu           # <owner>/<repo> or full URL
        ref: v1.8.0                   # branch, tag, or SHA
        path: ./workspace             # relative or absolute path
        depth: 1                      # shallow clone
        submodules: false            # initialize submodules?
        lfs: false                   # fetch LFS objects?
        clean: true                  # clean working copy before use
        cache: true                  # cache mirror in ~/.cache/dagu
        auth:
          method: token              # token | ssh | none
          tokenEnv: GITHUB_TOKEN     # environment variable for token
          sshKey: ~/.ssh/id_rsa      # SSH key path if method=ssh
    output: WORKSPACE                # exposed variable

Execution Semantics

Mirror Initialization

  • Cache mirror to ~/.cache/dagu/git/mirrors/github.com/<owner>/<repo>.git
  • Use flock to prevent concurrent corruption

Working Copy

  • If path exists and clean: true: run git clean -ffd and git reset --hard
  • Otherwise, clone with --reference from mirror
  • Checkout ref (or --detach if SHA)
  • If enabled: git submodule update and git lfs pull

Output

  • Set the absolute path of the working directory as the step's output variable

Implementation Plan

Config Struct

type GitCheckoutConfig struct {
  Repo       string
  Ref        string
  Path       string
  Depth      int
  Submodules bool
  LFS        bool
  Clean      bool
  Cache      bool
  Auth struct {
    Method   string
    TokenEnv string
    SSHKey   string
  }
}

Executor Skeleton

type gitCheckoutExecutor struct {
  cfg GitCheckoutConfig
}

func (e *gitCheckoutExecutor) Run(ctx context.Context, stepCtx dagu.ExecutorContext) error {
  // Resolve URL and mirror path
  // Clone or reuse mirror
  // Create or clean working copy
  // Checkout, init submodules, pull LFS if needed
  // stepCtx.SetOutput("WORKSPACE", absPath)
  return nil
}

Auth Handling

  • Token: embed in HTTPS URL
  • SSH: configure GIT_SSH_COMMAND
  • No secrets are logged or passed in CLI args

Integration with Dagu Features

Feature Integration
Artifacts Optionally mark path as artifact on step success
Guardrails Future: checksum validation before use
Local-first Offline reuse via mirror cache
UI/Logs Show clone progress, repo and ref summary

Security Considerations

  • Use environment variables for secrets
  • Block .. in path to prevent directory traversal
  • Isolate each checkout within DAG sandbox if needed

Future Extensions

  • Bitbucket, GitLab support
  • Sparse checkouts
  • Release asset downloads
  • Patch application support

yottahmd avatar Apr 24 '25 03:04 yottahmd

@yottahmd Hi! I think this is a really great idea!! I will try to implement a minimum version of it.

liooooo29 avatar Apr 24 '25 07:04 liooooo29

Thank you, @halalala222 ! This feature will be extremely useful. Please feel free to reach out if you have any questions. By the way, do you know of any good Go libraries for integrating with Git repositories?

yottahmd avatar Apr 24 '25 07:04 yottahmd

@yottahmd !Hi! go-git Does this repository count?

liooooo29 avatar Apr 24 '25 08:04 liooooo29

Thanks @halalala222, yes, go-git looks like a great choice 👍

yottahmd avatar Apr 24 '25 09:04 yottahmd