GenericAgent
GenericAgent copied to clipboard
Self-evolving agent: grows skill tree from 3.3K-line seed, achieving full system control with 6x less token consumption
English | 中文 | 📄 Technical Report:
| 📘 教程 | Sophub
📌 Official channel: This GitHub repository is the sole official source for GenericAgent. We have no affiliation with any third-party website using the GenericAgent name.
🌟 Overview
GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).
Its design philosophy: don't preload skills — evolve them.
Every time GenericAgent solves a new task, it automatically crystallizes the execution path into an skill for direct reuse later. The longer you use it, the more skills accumulate — forming a skill tree that belongs entirely to you, grown from 3K lines of seed code.
🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running
git initto every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.
📋 Core Features
- Self-Evolving: Automatically crystallizes each task into an skill. Capabilities grow with every use, forming your personal skill tree.
- Minimal Architecture: ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
- Strong Execution: Injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
- High Compatibility: Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
- Token Efficient: <30K context window — a fraction of the 200K–1M other agents consume. Layered memory ensures the right knowledge is always in scope. Less noise, fewer hallucinations, higher success rate — at a fraction of the cost.
🧬 Self-Evolution Mechanism
This is what fundamentally distinguishes GenericAgent from every other agent framework.
[New Task] --> [Autonomous Exploration] (install deps, write scripts, debug & verify) -->
[Crystallize Execution Path into skill] --> [Write to Memory Layer] --> [Direct Recall on Next Similar Task]
| What you say | What the agent does the first time | Every time after |
|---|---|---|
| "Read my WeChat messages" | Install deps → reverse DB → write read script → save skill | one-line invoke |
| "Monitor stocks and alert me" | Install mootdx → build selection flow → configure cron → save skill | one-line start |
| "Send this file via Gmail" | Configure OAuth → write send script → save skill | ready to use |
After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.
🎯 Demo Showcase
| 🧋 Food Delivery Order | 📈 Quantitative Stock Screening |
|---|---|
![]() |
![]() |
| "Order me a milk tea" — Navigates the delivery app, selects items, and completes checkout automatically. | "Find GEM stocks with EXPMA golden cross, turnover > 5%" — Screens stocks with quantitative conditions. |
| 🌐 Autonomous Web Exploration | 💰 Expense Tracking |
![]() |
![]() |
| Autonomously browses and periodically summarizes web content. | "Find expenses over ¥2K in the last 3 months" — Drives Alipay via ADB. |
📅 Latest News
- 2026-05-15: 🖥️ Desktop GUI released — one-line installs now ship a ready-to-run desktop app (
frontends/GenericAgent.exe), while developers can launch it withpython launch.pyw. - 2026-05-14: 🆕 Conductor sub-agent orchestration — spawn, supervise, and auto-clean parallel sub-agents; first-class delegation primitives complementing
/btwside-questions. - 2026-05-12: 🆕 TUI v2 released (
frontends/tuiapp_v2.py) — refined Textual frontend with image-paste folding, file paste, block-delete, Ctrl+C copy, history navigation, and/llm//export//continuepickers. - 2026-04-21: 📄 Technical Report released on arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
- 2026-04-11: Introduced L4 session archive memory and scheduler cron integration
- 2026-03-23: Support personal WeChat as a bot frontend
- 2026-03-10: Released million-scale Skill Library
- 2026-03-08: Released "Dintal Claw" — a GenericAgent-powered government affairs bot
- 2026-03-01: GenericAgent featured by Jiqizhixin (机器之心)
- 2026-01-16: GenericAgent V1.0 public release
🚀 Quick Start
⚠️ Python version: use Python 3.11 or 3.12. Do not use Python 3.14 — it is incompatible with
pywebviewand a few other GA dependencies.
📖 Detailed installation guide: installation.md · installation_zh.md(中文)
For LLM Agents
Fetch the installation guide and follow it:
curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation.md
For Humans
Method 1: One-line install (recommended)
This installs GenericAgent with an isolated Python environment and Git, then downloads a ready-to-run package.
Windows PowerShell
powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"
Linux / macOS
GLOBAL=1 bash -c "$(curl -fsSL http://fudankw.cn:9000/files/ga_install.sh)"
After installation, launch the desktop app from:
frontends/GenericAgent.exe
Method 2: Python install (for developers)
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]" # Core + UI dependencies
cp mykey_template.py mykey.py # Fill in your LLM API key
python launch.pyw
GenericAgent is meant to grow its environment through the Agent itself, not by pre-installing every possible package.
Full guide: GETTING_STARTED.md
🖥️ Frontends
Desktop App
For one-line installs on Windows, double-click:
frontends/GenericAgent.exe
Terminal UI
A lightweight, keyboard-driven interface built on Textual. Supports multiple concurrent sessions and real-time streaming.
python frontends/tuiapp_v2.py
Windows TUI troubleshooting. TUI rendering on Windows can be flaky depending on terminal + font. Common causes:
textualis not on the latest version —pip install -U textualfirst.- PowerShell / cmd ship with terminals that have rough Unicode + key-binding support. Prefer Git Bash on Windows, which is much better behaved.
- If it still looks broken, ask GA itself to fix it:
"My experience using
frontends/tuiapp_v2.pyin PowerShell / cmd / Git Bash on Windows is very poor — lots of incompatibility. Please refer to Claude Code's best practices for the Windows terminal and fix all font and rendering incompatibilities."
Streamlit UI
python launch.pyw
💬 Bot Interface (IM)
GenericAgent also supports IM frontends such as Telegram, WeChat, QQ, Feishu / Lark, WeCom, and DingTalk.
Typical usage:
python frontends/tgapp.py # Telegram
python frontends/wechatapp.py # WeChat
python frontends/qqapp.py # QQ
python frontends/fsapp.py # Feishu / Lark
python frontends/wecomapp.py # WeCom
python frontends/dingtalkapp.py # DingTalk
For detailed setup, ask GenericAgent itself.
Common chat commands:
/new- start a fresh conversation and clear the current context/continue- list recoverable conversation snapshots/continue N- restore theNth recoverable conversation
📊 Comparison with Similar Tools
| Feature | GenericAgent | OpenClaw | Claude Code |
|---|---|---|---|
| Codebase | ~3K lines | ~530,000 lines | Open-sourced (large) |
| Deployment | pip install + API Key |
Multi-service orchestration | CLI + subscription |
| Browser Control | Real browser (session preserved) | Sandbox / headless browser | Via MCP plugin |
| OS Control | Mouse/kbd, vision, ADB | Multi-agent delegation | File + terminal |
| Self-Evolution | Autonomous skill growth | Plugin ecosystem | Stateless between sessions |
| Out of the Box | A few core files + starter skills | Hundreds of modules | Rich CLI toolset |
📈 Evaluation — Five Dimensions
📂 Full evaluation datasets and results: https://github.com/JinyiHan99/GA-Technical-Report/tree/main
| Dimension | Question | Benchmarks used |
|---|---|---|
| 1. Task Completion & Token Efficiency | Can GA complete hard tasks more cheaply than leading agents? | SOP-Bench, Lifelong AgentBench, RealFin-Benchmark |
| 2. Tool-Use Efficiency | Can a minimal atomic toolset solve what specialized toolsets solve, with less overhead? | Tool Efficiency Benchmark (11 simple + 5 long-horizon tasks) |
| 3. Memory System Effectiveness | Does condensed hierarchical memory beat full/redundant memory and embedding-based retrievers? | SOP-Bench (dangerous goods), LoCoMo, 20-skill stress test |
| 4. Self-Evolution Capability | Can the agent distill experience into reusable SOPs and code, without intervention? | 9-round LangChain longitudinal study, 8-task cross-task web benchmark |
| 5. Web Browsing Capability | Does density-driven design survive the open web? | WebCanvas, BrowseComp-ZH, Custom Tasks (22) |
Baselines across these dimensions include Claude Code, OpenAI CodeX, and OpenClaw, evaluated under Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.4, and MiniMax M2.7 backbones.
![]() Tool-use efficiency radar. GA dominates token, request, and tool-call axes while preserving quality across four task dimensions. |
![]() Cross-task self-evolution. Second- and third-run GA executions converge to a stable low-cost regime across eight web tasks, while OpenClaw shows no such convergence. |
🧠 How It Works
GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.
1️⃣ Layered Memory System
Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.
- L0 — Meta Rules: Core behavioral rules and system constraints of the agent
- L1 — Insight Index: Minimal memory index for fast routing and recall
- L2 — Global Facts: Stable knowledge accumulated over long-term operation
- L3 — Task Skills / SOPs: Reusable workflows for completing specific task types
- L4 — Session Archive: Archived task records distilled from finished sessions for long-horizon recall
2️⃣ Autonomous Execution Loop
Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop
The entire core loop is just ~100 lines of code (agent_loop.py).
3️⃣ Minimal Toolset
GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.
| Tool | Function |
|---|---|
code_run |
Execute arbitrary code |
file_read |
Read files |
file_write |
Write files |
file_patch |
Patch / modify files |
web_scan |
Perceive web content |
web_execute_js |
Control browser behavior |
ask_user |
Human-in-the-loop confirmation |
Additionally, 2 memory management tools (
update_working_checkpoint,start_long_term_update) allow the agent to persist context and accumulate experience across sessions.
4️⃣ Capability Extension Mechanism
Capable of dynamically creating new tools.
Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.
GenericAgent Workflow Diagram
⭐ Support
If this project helped you, please consider leaving a Star! 🙏
You're also welcome to join our GenericAgent Community Group for discussion, feedback, and co-building 👏
WeChat Group 18![]() |
🚩 Friendly Links
Thanks for the support from the LinuxDo community!
Community GUIs (independent open-source projects):
📄 License
MIT License — see LICENSE
Disclaimer: This project does not build or operate any commercial website. Apart from DintalClaw, no institution, organization, or individual is currently officially authorized to conduct commercial activities under the GenericAgent name.
🌟 项目简介
GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3K 行代码,通过 9 个原子工具 + ~100 行 Agent Loop,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备。
它的设计哲学是:不预设技能,靠进化获得能力。
每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3K 行种子代码生长出来的专属技能树。
🤖 自举实证 — 本仓库的一切,从安装 Git、
git init到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。
📋 核心特性
- 自我进化: 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树
- 极简架构: ~3K 行核心代码,Agent Loop 约百行,无复杂依赖,部署零负担
- 强执行力: 注入真实浏览器(保留登录态),9 个原子工具直接接管系统
- 高兼容性: 支持 Claude / Gemini / Kimi / MiniMax 等主流模型,跨平台运行
- 极致省 Token: 上下文窗口不到 30K,是其他 Agent(200K–1M)的零头。分层记忆让关键信息始终在场——噪声更少,幻觉更低,成功率反而更高,而成本低一个数量级。
🧬 自我进化机制
这是 GenericAgent 区别于其他 Agent 框架的根本所在。
[遇到新任务]-->[自主摸索](安装依赖、编写脚本、调试验证)-->
[将执行路径固化为 Skill]-->[写入记忆层]-->[下次同类任务直接调用]
| 你说的一句话 | Agent 第一次做了什么 | 之后每次 |
|---|---|---|
| "监控股票并提醒我" | 安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill | 一句话启动 |
| "用 Gmail 发这个文件" | 配置 OAuth → 编写发送脚本 → 保存 Skill | 直接可用 |
用几周后,你的 Agent 实例将拥有一套任何人都没有的专属技能树,全部从 3K 行种子代码中生长而来。
🎯 实例展示
| 🧋 外卖下单 | 📈 量化选股 |
|---|---|
![]() |
![]() |
| "Order me a milk tea" — 自动导航外卖 App,选品并完成结账 | "Find GEM stocks with EXPMA golden cross, turnover > 5%" — 量化条件筛股 |
| 🌐 自主网页探索 | 💰 支出追踪 |
![]() |
![]() |
| 自主浏览并定时汇总网页信息 | "查找近 3 个月超 ¥2K 的支出" — 通过 ADB 驱动支付宝 |
📅 最新动态
- 2026-05-15: 🖥️ 桌面 GUI 发布 —— 一键安装现在会自带可直接运行的桌面端(
frontends/GenericAgent.exe),开发者也可用python launch.pyw启动。 - 2026-05-14: 🆕 Conductor 子 Agent 编排 —— 派发、监督、自动清理并行子 Agent;与
/btw旁路子 Agent 互补,提供一等公民级的任务委派原语。 - 2026-05-12: 🆕 TUI v2 正式发布(
frontends/tuiapp_v2.py)—— 重做视觉风格的 Textual 前端,支持图片粘贴折叠、文件粘贴、块删除、Ctrl+C 复制、历史导航,以及/llm//export//continue选择器。 - 2026-04-21: 📄 技术报告已发布至 arXiv — GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
- 2026-04-11: 引入 L4 会话归档记忆,并接入 scheduler cron 调度
- 2026-03-23: 支持个人微信接入作为 Bot 前端
- 2026-03-10: 发布百万级 Skill 库
- 2026-03-08: 发布以 GenericAgent 为核心的"政务龙虾" Dintal Claw
- 2026-03-01: GenericAgent 被机器之心报道
- 2026-01-16: GenericAgent V1.0 公开版本发布
🚀 快速开始
⚠️ Python 版本: 推荐使用 Python 3.11 或 3.12。请不要使用 Python 3.14,与
pywebview及部分依赖不兼容。
📖 详细安装指南:installation_zh.md(中文) · installation.md (English)
给 LLM Agent 看的
获取安装指南并照做:
curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation_zh.md
给人类用户看的
方法一:一键安装(推荐)
一键安装会自动准备独立 Python 环境、Git、项目文件和桌面端,不污染系统环境。
Windows PowerShell
powershell -ExecutionPolicy Bypass -c "irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"
Linux / macOS
curl -fsSL http://fudankw.cn:9000/files/ga_install.sh | bash
安装完成后,双击启动:
frontends/GenericAgent.exe
方法二:Python 安装(开发者)
git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]" # 核心 + UI 依赖
cp mykey_template.py mykey.py # 填入你的 LLM API Key
python launch.pyw
GenericAgent 更推荐由 Agent 在使用中自举环境,而不是预先手动装完整依赖。
完整引导流程见 GETTING_STARTED.md。
📖 新手使用指南(图文版):飞书文档
📘 完整入门教程(Datawhale 出品):Hello GenericAgent · GitHub
🖥️ 前端启动方式
桌面端
一键安装自带桌面端,双击:
frontends/GenericAgent.exe
终端 UI
基于 Textual 的轻量键盘驱动界面。支持多会话并发、实时流式输出,有终端就能跑。
python frontends/tuiapp_v2.py
Windows 上 TUI 显示异常的排查思路:
textual版本太旧,先pip install -U textual;- PowerShell / cmd 自带终端对 Unicode 和键位的支持比较糟糕,Windows 上推荐用 Git Bash,体验明显更稳;
- 仍然显示异常时,可以让 GA 自己修一遍,参考 Prompt:
"我在 Windows 的 PowerShell / cmd / Git Bash 中使用
frontends/tuiapp_v2.py体验非常差,出现了一堆不兼容问题。请参考 Claude Code 在 Windows 终端的最佳配置,把所有字体和显示不兼容的问题修一遍。"
Streamlit UI
python launch.pyw
💬 Bot 接口(IM)
GenericAgent 支持 Telegram、微信、QQ、飞书 / Lark、企业微信、钉钉等 IM 前端。
常用启动方式:
python frontends/tgapp.py # Telegram
python frontends/wechatapp.py # 微信
python frontends/qqapp.py # QQ
python frontends/fsapp.py # 飞书 / Lark
python frontends/wecomapp.py # 企业微信
python frontends/dingtalkapp.py # 钉钉
详细配置直接问 GenericAgent。
通用聊天命令:
/new- 开启新对话并清空当前上下文/continue- 列出可恢复会话快照/continue N- 恢复第N个可恢复会话
📊 与同类产品对比
| 特性 | GenericAgent | OpenClaw | Claude Code |
|---|---|---|---|
| 代码量 | ~3K 行 | ~530,000 行 | 已开源(体量大) |
| 部署方式 | pip install + API Key |
多服务编排 | CLI + 订阅 |
| 浏览器控制 | 注入真实浏览器(保留登录态) | 沙箱 / 无头浏览器 | 通过 MCP 插件 |
| OS 控制 | 键鼠、视觉、ADB | 多 Agent 委派 | 文件 + 终端 |
| 自我进化 | 自主生长 Skill 和工具 | 插件生态 | 会话间无状态 |
| 出厂配置 | 几个核心文件 + 少量初始 Skills | 数百模块 | 丰富 CLI 工具集 |
📈 评测 — 五大维度
📂 完整的评测数据集以及评测结果见:https://github.com/JinyiHan99/GA-Technical-Report/tree/main
| 维度 | 核心问题 | 使用的基准 |
|---|---|---|
| 1. 任务完成度与 Token 效率 | GA 能否以更低成本完成高难度任务? | SOP-Bench、Lifelong AgentBench、RealFin-Benchmark |
| 2. 工具使用效率 | 最小原子工具集能否以更低开销替代专用工具集? | Tool Efficiency Benchmark |
| 3. 记忆系统有效性 | 精简分层记忆能否超越冗余记忆和基于 Embedding 的检索器? | SOP-Bench、LoCoMo、20-skill 压力测试 |
| 4. 自我进化能力 | Agent 能否在无人干预下将经验提炼为可复用的 SOP 与代码? | 9 轮 LangChain 纵向研究、8 任务跨任务 Web 基准 |
| 5. 网页浏览能力 | 信息密度驱动设计能否适应开放网页? | WebCanvas、BrowseComp-ZH、自定义任务 |
以上维度的基线包括 Claude Code、OpenAI CodeX 和 OpenClaw,分别在 Claude Sonnet 4.6、Claude Opus 4.6、GPT-5.4 和 MiniMax M2.7 底座上进行评测。
![]() 工具使用效率雷达图。GA 在 Token、请求数和工具调用轴上全面领先,同时在四个任务维度上保持质量。 |
![]() 跨任务自我进化。GA 的第二轮和第三轮执行在 8 个 Web 任务上收敛至稳定的低成本区间。 |
🧠 工作机制
GenericAgent 通过分层记忆 × 最小工具集 × 自主执行循环完成复杂任务,并在执行过程中持续积累经验。
1️⃣ 分层记忆系统
记忆在任务执行过程中持续沉淀,使 Agent 逐步形成稳定且高效的工作方式
- L0 — 元规则(Meta Rules):Agent 的基础行为规则和系统约束
- L1 — 记忆索引(Insight Index):极简索引层,用于快速路由与召回
- L2 — 全局事实(Global Facts):在长期运行过程中积累的稳定知识
- L3 — 任务 Skills / SOPs:完成特定任务类型的可复用流程
- L4 — 会话归档(Session Archive):从已完成任务中提炼出的归档记录,用于长程召回
2️⃣ 自主执行循环
感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环
整个核心循环仅 约百行代码(agent_loop.py)。
3️⃣ 最小工具集
GenericAgent 仅提供 9 个原子工具,构成与外部世界交互的基础能力
| 工具 | 功能 |
|---|---|
code_run |
执行任意代码 |
file_read |
读取文件 |
file_write |
写入文件 |
file_patch |
修改文件 |
web_scan |
感知网页内容 |
web_execute_js |
控制浏览器行为 |
ask_user |
人机协作确认 |
此外,还有 2 个记忆管理工具(
update_working_checkpoint、start_long_term_update),使 Agent 能够跨会话积累经验、维持持久上下文。
4️⃣ 能力扩展机制
具备动态创建新的工具能力
通过 code_run,GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件,将临时能力固化为永久工具。
GenericAgent 工作流程图
⭐ 支持
如果这个项目对您有帮助,欢迎点一个 Star! 🙏
同时也欢迎加入我们的GenericAgent体验交流群,一起交流、反馈和共建 👏
微信群 18![]() |
🚩 友情链接
感谢 LinuxDo 社区的支持!
社区 GUI 客户端(独立开源项目):
📄 许可
MIT License — 详见 LICENSE
声明:本项目未构建任何商业站点;除 DintalClaw 外,目前未官方授权任何机构、组织或个人以 GenericAgent 名义从事商业活动。






