skills icon indicating copy to clipboard operation
skills copied to clipboard

Proposal: Executable Agent Skills

Open keithagroves opened this issue 5 days ago • 0 comments

Summary

Current Agent Skills excel at instruction, but they lack a native execution model. This forces users (or agents) to manually manage environments, dependencies, and runtime safety—creating friction, security risks, and reproducibility failures.

This proposal introduces a minimal, optional extension to the SKILL.md frontmatter that enables secure, containerized, reproducible execution of skills—without breaking existing skills or changing the authoring model.

The design is intentionally small: four execution fields plus schemas. Together, they transform a skill from a README an agent follows into a capability an agent can execute.

Skills tell the Agent what to do. Executable manifests let the Agent actually do it.


The Gap: The "Instruction-Only" Problem

Today, skills stop at instruction. Execution is delegated to the user.

Resulting Issues

  1. Dependency Hell: Users must manually install packages (pip, npm, cargo) matching the skill's requirements.
  2. Environment Pollution: Global dependencies conflict across different skills.
  3. No Isolation: Scripts run on the host machine, risking accidental damage (rm -rf) or data leakage.
  4. Non-Reproducibility: OS differences and version drift cause "works on my machine" failures.
  5. No Contracts: Inputs and outputs are undocumented and unvalidated, leading to hallucinated arguments.

The Solution: Executable Manifests

We extend the SKILL.md frontmatter with optional execution fields.

Proposed Frontmatter Extension

---
name: pdf-extract
description: Extract text from PDF files using OCR or text layers.

# EXECUTION FIELDS (OPTIONAL)
from: python:3.12-slim
build: pip install pypdf pdfplumber
command: python /work/extract.py ${file}

inputSchema:
  type: object
  properties:
    file:
      type: string
      description: Path to PDF file
  required: [file]

outputSchema:
  type: object
  properties:
    text:
      type: string
---

If these fields are present, the skill is executable. If not, the skill behaves exactly as it does today (pure documentation).

Field Definitions

Field Purpose Why It Matters
from Container base image Guarantees identical runtime everywhere.
build One-time setup (cached) Automates dependency installation.
command Deterministic entrypoint Defines explicit execution behavior.
inputSchema JSON Schema validation Prevents hallucinated inputs before execution.
outputSchema JSON Schema validation Enables reliable output parsing and chaining.

This is the smallest useful surface area that unlocks reliable execution.


Workflow: Before vs. After

Current State

User: "Extract text from report.pdf" Agent:

  1. Reads README.
  2. Asks user to install pypdf.
  3. Writes a Python script to a temporary file.
  4. Asks user to run the script. Result: High friction, low safety, zero reproducibility.

Proposed State

User: "Extract text from report.pdf" Agent:

  1. Identifies pdf-extract skill.
  2. Validates input against inputSchema.
  3. Executes skill in a sandbox.
  4. Returns structured output. Result: Zero friction, high safety, guaranteed reproducibility.

Technical Architecture

The execution engine acts as a translation layer between the Agent and the Docker runtime.

graph TD
    A[Agent / User] -->|1. Invoke with Args| B(Execution Engine)
    B -->|2. Validate Input| C{Schema Valid?}
    C -->|No| D[Return Error]
    C -->|Yes| E[Build/Load Container]
    E -->|3. Execute Command| F[Sandboxed Runtime]
    F -->|4. Capture Output| B
    B -->|5. Return Structured Data| A

Security Model

  • Isolation: Ephemeral containers per execution. No host filesystem access (except explicitly mounted inputs).
  • Networking: Disabled by default unless explicitly requested.
  • Validation: All arguments are validated against the schema before the container starts.
  • Secrets: Injected at runtime via environment variables; never written to disk or logs.

Reference Implementation: Enact

Enact is a working, open-source implementation of this proposal. It proves that combining documentation and execution in one file is viable.

1. Discovery (enact learn)

The "Literate" aspect: Agents can inspect the contract before execution. This reveals the manifest, schema, and human-readable documentation in one view.

$ enact learn enact/hello-python

enact/[email protected]
───────────────────────────

---
name: "enact/hello-python"
version: "1.0.3"
description: "A simple Python greeting tool"
from: "python:3.12-slim"

inputSchema:
  type: object
  properties:
    name:
      type: string
      description: "Name to greet"
      default: "World"

command: "python /work/hello.py ${name}"
---

# Hello Python

A simple Python tool that greets you by name.

2. Execution (enact run)

Once validated, the tool is executed in a secure, ephemeral container.

$ enact run enact/hello-python --args '{"name": "Anthropic"}'

◇  ✓ Resolved: enact/hello-python
◐  Running enact/hello-python (python:3.12-slim)...
◇  ✓ Execution complete

Hello, Anthropic! 🐍
Generated at: 2025-12-19T15:33:38
Python version: 3.12.12

Note: This proposal does not require adopting Enact itself—only the manifest semantics it effectively demonstrates.


Backwards Compatibility

  • All execution fields are optional.
  • Existing skills continue to work unchanged.
  • No migration is required.
  • Instruction-only skills remain first-class citizens.

Closing

This proposal does not change what skills are. It changes what skills can do.

By adopting a minimal executable manifest, Agent Skills evolve from static documentation into real, composable, secure capabilities.

One file becomes a portable, executable unit of intelligence.

keithagroves avatar Dec 19 '25 17:12 keithagroves