argo-workflows icon indicating copy to clipboard operation
argo-workflows copied to clipboard

Custom artifact repository plugins

Open jalberti opened this issue 4 years ago • 2 comments

Summary

There are several Github issues that discuss volumes and how they can and do relate to artifact management, e.g. https://github.com/argoproj/argo-workflows/issues/1024, https://github.com/argoproj/argo-workflows/issues/1349. Today the supported artifact repository types are fixed, and extensions are only feasible in-tree, if I understand this correctly? I’m wondering if there were any specific thoughts on how CSI could be used, or a CSI like concept, to allow artifact repositories to be provided as plugins?

The problem at the moment is, unless I overlooked something, that while I can get volumes into the pod, into my workflow, and I can also write arbitrary code, and can even have side cars as demonstrated here https://github.com/argoproj/argo-workflows/issues/4988, ... if my input or output does not fit into the standard supported artifact repository set, I'm not able to use the concept of artifact declaration. Essential the "declaration" of "input/output" is lost. If one would be able to bring their own repository, then we can use the concept of artifacts together with custom code, i.e. we would have extensibility and declarative artifacts.

E.g. we could define a stable JSON-RPC interface, and allow community provided artifact repositories in form of a side car container? The artifact repository provider would create the container image with the JSON-RPC server, Argo would act as the JSON-RPC client. The Argo workflow would define an artifact repository of type jsonrpc, the artifact repository driver container and the workflow step container would share a common volumeMount such that artifact content does not need to pass the json-rpc pipe.

Use Cases

This would be useful to add support for custom community supported artifact repositories.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

jalberti avatar May 07 '21 22:05 jalberti

This fits in well with our plugin architecture. I'm not sure if the best way to do this is sidecar, or if it should be done using binaries embedded on the image.

alexec avatar Apr 18 '22 15:04 alexec

This would fix the #1540 support.

alexec avatar Apr 18 '22 15:04 alexec

Here is my proposal to implement this.

Core Technical Constraints

The plugin architecture must operate within Argo Workflows’ execution model where short-lived pods handle workflow steps, requiring instantaneous startup without runtime dependency resolution. Init containers must complete artifact downloads before main container execution begins. Plugins should not utilize shared volumes for artifact transfer due to potential multi-gigabyte file sizes and lack of guaranteed shared storage availability for this.

A single workflow pod can access and store multiple artifacts which can use different artifact drivers for each artifact for both inputs and outputs.

Where is the artifact code used:

  • argo-server (argocli image): ArtifactServer (server/artifacts/artifact_server.go)
  • argoexec: WorkflowExecutor (workflow/executor/executor.go)
    • For use during init as an initContainer to download artifacts
      • Downloaded artifacts are stored in a shared local volume
    • For use by wait as a sidecar to upload artifacts
    • Artifact GC

Requirements

  • Runtime Independence:
    • Zero plugin downloads during pod initialization
    • Exclusive use of pre-built Docker images leveraging Kubernetes’ cluster-wide image caching
    • Elimination of any extra shared volume dependencies for artifact transfer
  • Language Ecosystem Support:
    • First-class support for Go (primary implementation language)
    • Required compatibility with Python, Java, and Rust
    • Optional support for additional languages without architectural constraints
  • Artifact Handling Capabilities:
    • Direct filesystem access to pod-local storage paths
    • Bidirectional artifact transfer (download/upload)
    • Multi-gigabyte file handling with O(1) time complexity operations
    • No network-based data transfer for artifact contents
  • Execution Context Requirements:
    • Init container (argoexec) execution for artifact download
    • Wait container (argoexec) execution for artifact upload
    • Artifact GC container (argoexec) for deletion
    • Argo-server (argocli) execution for artifact browsing
  • Security Boundaries:
    • Process isolation between plugin and main application
    • Filesystem sandboxing for untrusted plugins

Interface to the plugin

WASM

I have evaluated using WASM but

  • WASM sandboxing would restrict network and file system access and those capabilities would probably mean the WASI standard for interfacing would be required. This is quite a young standard.
  • Compiling Java or Python to WASM is non-trivial and would add significant complexity to the plugin development experience.

GRPC

We already use GRPC, and it works here, so continuing to do this is the best fit.

The existing golang interface looks like this:

// ArtifactDriver is the interface for loading and saving of artifacts
type ArtifactDriver interface {
    // Load accepts an artifact source URL and places it at specified path
    Load(inputArtifact *v1alpha1.Artifact, path string) error

    // OpenStream opens an artifact for reading. If the artifact is a file,
    // then the file should be opened. If the artifact is a directory, the
    // driver may return that as a tarball. OpenStream is intended to be efficient,
    // so implementations should minimise usage of disk, CPU and memory.
    // Implementations must not implement retry mechanisms. This will be handled by
    // the client, so would result in O(nm) cost.
    OpenStream(a *v1alpha1.Artifact) (io.ReadCloser, error)

    // Save uploads the path to artifact destination
    Save(path string, outputArtifact *v1alpha1.Artifact) error

    Delete(artifact *v1alpha1.Artifact) error

    ListObjects(artifact *v1alpha1.Artifact) ([]string, error)

    IsDirectory(artifact *v1alpha1.Artifact) (bool, error)
}

v1alpha1.Artifact is defined in https://github.com/argoproj/argo-workflows/blob/d3cfe9ecc337e28ea5be7b2e8adee9735bea88f4/pkg/apis/workflow/v1alpha1/workflow_types.go#L960 and is already available as protocol buffer types.

Converting these methods to GRPC should be simple.

  • All of the methods except for OpenStream are straightforward, and will just transfer small amounts of data over GRPC, having direct access to storage and network to perform their task.

Here are the protobuf definitions for the GRPC interface:

// Artifact Service
//
// Artifact Service API provides GRPC access to artifact operations
package artifact;

message LoadArtifactRequest {
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact input_artifact = 1;
  string path = 2;
}

message LoadArtifactResponse {
  bool success = 1;
  string error = 2;
}

message OpenStreamRequest {
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact artifact = 1;
}

message OpenStreamResponse {
  bytes data = 1;
  bool is_end = 2;
  string error = 3;
}

message SaveArtifactRequest {
  string path = 1;
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact output_artifact = 2;
}

message SaveArtifactResponse {
  bool success = 1;
  string error = 2;
}

message DeleteArtifactRequest {
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact artifact = 1;
}

message DeleteArtifactResponse {
  bool success = 1;
  string error = 2;
}

message ListObjectsRequest {
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact artifact = 1;
}

message ListObjectsResponse {
  repeated string objects = 1;
  string error = 2;
}

message IsDirectoryRequest {
  github.com.argoproj.argo_workflows.v3.pkg.apis.workflow.v1alpha1.Artifact artifact = 1;
}

message IsDirectoryResponse {
  bool is_directory = 1;
  string error = 2;
}

service ArtifactService {
  rpc Load(LoadArtifactRequest) returns (LoadArtifactResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/load"
      body: "*"
    };
  }

  rpc OpenStream(OpenStreamRequest) returns (stream OpenStreamResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/stream"
      body: "*"
    };
  }

  rpc Save(SaveArtifactRequest) returns (SaveArtifactResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/save"
      body: "*"
    };
  }

  rpc Delete(DeleteArtifactRequest) returns (DeleteArtifactResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/delete"
      body: "*"
    };
  }

  rpc ListObjects(ListObjectsRequest) returns (ListObjectsResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/list"
      body: "*"
    };
  }

  rpc IsDirectory(IsDirectoryRequest) returns (IsDirectoryResponse) {
    option (google.api.http) = {
      post: "/api/v1/artifacts/is-directory"
      body: "*"
    };
  }
}

OpenStream

The OpenStream method is problematic.

OpenStream transfers a stream of data from the artifact directly to the client, but we now have GRPC in the way, causing a double hop.

We should measure the performance of this double hop and see if it is a problem.

Configuring

System configuration

The workflow-controller-configmap will contain a new field artifactDrivers which will be a list of artifact drivers. This is a list of mappings of driver name to driver image e.g.

artifactDrivers:
- name: plugin-x
  image: quay.io/argoproj/artifact-plugin-x:v0.1.0

This allows the argo-server to acquire all the plugins that are available as sidecars. This also allows the system administrator to configure which plugins are available. Changing this list should cause the argo-server to restart.

It also allows updating the plugin versions and retaining access to the previously stored artifacts.

(These are reasons why the image isn’t just passed into the v1alpha1.Artifact as a parameter)

The artifactDrivers should have some containerSpecs added to them for their various uses, so that things like resources can be set.

Using a plugin

ArtifactLocation is defined in https://github.com/argoproj/argo-workflows/blob/d3cfe9ecc337e28ea5be7b2e8adee9735bea88f4/pkg/apis/workflow/v1alpha1/workflow_types.go#L1167

This currently has a hardcoded list of builtin artifact drivers.

A new entry will be added to this of a driver called Plugin. This will have two parameters:

  • name: The name of the plugin
  • configuration: A string of the configuration for the plugin. This will be passed to the plugin as is, but we strongly encourage plugins to use yaml format. It is up to the plugin to decide how to use this configuration, it will be passed in each request as part of the v1alpha1.Artifact.

How to obtain and run the plugin

Requirements:

  • Plugins can be cached
  • No plugin downloads during pod initialization
  • For speed of startup
  • Very high frequency of plugin initialization
  • Non-compiled plugins (e.g. python) need to be runnable, and downloading a bundle is more complex

Leveraging Kubernetes image caching, we can cache the plugin image by delivering plugins in images, including all dependencies

Option 1: Custom Image

Principle downsides:

  • Requires two custom images to be built and deployed for every version of Argo Workflows to embed the workflows binaries.
  • Process isolation between the plugin and the main application is not as strong as with a sidecar.

Principle upside:

  • Simpler to implement on the Argo Workflows side

Option 2: Sidecar image

Principle downsides:

  • Init containers cannot have sidecars, so we would need to run two init containers, one to provide the argoexec binary to the plugin binary via a volume.
  • In future image volume could do this: https://kubernetes.io/docs/concepts/storage/volumes/#image but this is not GA in Kubernetes yet.

Principle upside:

  • No need to build and deploy a custom image

Selected proposal:

Use option 2 - sidecar image, especially as we can leverage image volumes in the future.

All Argo Workflows binaries are standalone, so copying the binaries into another image is straightforward.

How to implement a plugin

The plugin is a docker image with a GRPC server that implements the artifact.ArtifactService interface. The entrypoint of the docker image is the GRPC server. The plugin should take one argument which is the listen socket. ** This is very much subject to change. **

How to import a .proto file into your language of choice is language dependent, but there are many examples.

The GRPC server only needs to handle a single request at once. The GRPC server must restart on failure.

Implementation in the Argo Workflows project

Argo Server, And main and Artifact GC usage

In these cases the plugin will be run as a sidecar. The sidecar name will be artifact-plugin-<name>, and the GRPC messages will be sent via a unix socket of the same name on a shared volume.

In the go code the Plugin driver code will call the relevant GRPC method on the plugin via the unix socket.

The server must come pre-configured with the sidecars - as this is a deployment from the user. argo-helm will help, but the rest of this will just be documenting how to set this up. A sidecar in the configmap which is missing in the deployment will cause argo-server to fail to start.

Init Container

The init container case must be different due to the lack of sidecars.

The first initContainer will be argoexec and will copy the argoexec binary into a shared volume. The second (and possibly subsequent) initContainers will be the plugin images, and will run the argoexec binary from the shared volume, calling the GRPC methods on the plugin via the unix socket to download the artifacts for that plugin. The workflow-controller will lookup the entrypoint for the plugin and pass it to these initContainers for argoexec to run up in the background.

(There will be one initContainer for each plugin that’s used in the pod’s input artifacts, hence “possibly subsequent”.)

Considerations

The plugin server may not ready as soon as the caller, so this should be handled gracefully and retried.

Joibel avatar Jun 24 '25 08:06 Joibel