Interoperability smoke test suite
Summary
As a server framework, Fedify's core value lies in its ability to correctly interoperate with other ActivityPub implementations in the Fediverse. Currently, we rely on unit tests and manual testing, but we lack an automated, systematic way to verify E2E interoperability, similar to Node.js's CITGM (canary in the gold mine).
This issue proposes creating a new CI workflow dedicated to running smoke tests against live instances of major ActivityPub servers (e.g., Mastodon, Misskey) to ensure our federation logic is robust and compatible.
Proposed solution
The plan involves three main components:
- CI workflow: A GitHub Actions workflow that uses
docker-composeto spin up services for:- One or more target ActivityPub servers (e.g., a Mastodon instance).
- Our “Fedify test harness” application.
- Fedify test harness: Since Fedify is a library, we will create a minimal, lightweight Fedify application within this repository (e.g., under
test/smoke-harness/). Its sole purpose is to serve as an endpoint for these tests. - CI orchestrator: The main test script (e.g., a Deno script) that orchestrates the E2E test.
Implementation details
This E2E test cannot only test Fedify. It must verify that actions are correctly sent, received, and interpreted by both sides.
1. Fedify test harness app
- It will be a minimal
fedifyapp with basic handlers (e.g., forActor,Inbox,Outbox). - It will use a simple data store (e.g., in-memory or Deno KV).
- Crucially, it will expose internal “backdoor” APIs for the test orchestrator (e.g.,
POST /_test/follow,POST /_test/create-note,GET /_test/get-latest-inbox-item).
2. CI orchestrator and verification
The orchestrator script will manage the entire test flow by communicating with both our test harness and the target server's API.
Example scenario: Fedify → Mastodon (Create(Note))
- Setup: The orchestrator uses Mastodon's API (
tootctlor REST) to create a test user (@[email protected]) and get an API token. - Action: The orchestrator calls our harness's backdoor API:
POST /_test/create-note?content=... - Federation: Our harness app, using
fedify, sends aCreateactivity to@mastodon-user's inbox. - Verification: The orchestrator uses the Mastodon API token to poll
@mastodon-user's home timeline (GET /api/v1/timelines/home). - Assert: The test passes if the new note from our Fedify harness appears on the Mastodon user's timeline within a timeout.
Example scenario: Mastodon → Fedify (reply)
- Action: The orchestrator uses the Mastodon API to post a reply to the note from the previous test.
- Federation: The Mastodon server sends a
Create(reply) activity to our Fedify harness's inbox. - Verification: The orchestrator calls our harness's backdoor API:
GET /_test/get-latest-inbox-item. - Assert: The test passes if the harness returns the reply activity from Mastodon.
CI strategy
- These tests will be too long-running and resource-intensive to run on every PR.
- They should be configured to run on pushes to:
- main
- next
- Maintenance branches (e.g., *.*-maintenance)
- We will also add a
workflow_dispatchtrigger to allow them to be run manually on a specific PR branch when necessary (e.g., when federation code is changed).
Phased rollout (target implementations)
We will add implementations gradually.
- Phase 1 (Core microblogging):
- Mastodon (De-facto standard)
- Misskey (Major alternative with different characteristics)
- Phase 2 (Major stacks & types):
- Akkoma/Pleroma (Elixir-based)
- Pixelfed (Media-focused,
Image/Videoobjects)
- Phase 3 (Service diversity):
- PeerTube (Video /
Groupactor for channels) - Lemmy / Kbin (Community /
Groupinteraction) - WriteFreely (
Articleobjects)
- PeerTube (Video /
Acceptance criteria (for this task)
- A CI workflow is created.
- A minimal Fedify test harness app is built within the repo.
- The workflow successfully runs E2E tests (e.g.,
Follow,Create(Note), reply) against Mastodon. - The workflow is configured to run on pushes to
main,next, and*.*-maintenancebranches, and onworkflow_dispatch.
Have you seen https://pasture.funfedi.dev/, via https://nlnet.nl/project/FediverseTestFramework/ ?
@nikclayton Thanks for the pointer! It looks really useful for this—especially the pre-configured Docker containers for Mastodon, Misskey, and other fediverse applications, which would save us significant setup time. The actor verification tools could also help validate our test harness implementation, and the support tables might guide us on which ActivityPub object variations to prioritize testing. I'll definitely look into integrating Pasture's components into our smoke test suite design.
As the author of https://pasture.funfedi.dev/, I have some useful comments and mea-culpas
- the pasture is at the state: works reliably on my machine. This means, I can update the containers https://containers.funfedi.dev/ on a regular basis, and the thing stays working.
- My impression from my fellow developers is that the compose files might not work with certain other setups. (fedora + podman, apple with arm chips). I consider these hard issues to solve. I do not have the test hardware or infrastructure lying around to fix.
- I suggest starting integration tests with mitra. It has relatively sane error messages, and is light weight to run, and does not require patches.
- I'm happy to cooperate if there are tasks for me to do on the infrastructure.
On the other side of the coin: I've submitted a proposal to nlnet that would take the data from the support tables https://funfedi.dev/support_tables/ and use them to determine a rule set what a parser ActivityPub -> application format should support. If this gets done, E3E tests become pretty much obsolete. Following this rule set will be enough.
@HelgeKrueger Thanks for the detailed insights and the offer to collaborate! I appreciate the transparency about the current state.
I'm actually running Fedora + Podman myself, so I might encounter some of those setup issues you mentioned; if I do, I'd be happy to contribute fixes back to the project. Starting with Mitra sounds like a great suggestion given its sane error messages and lightweight nature.
Your NLnet proposal for deriving parser rule sets from the support tables sounds really promising; it would indeed make interoperability testing much more systematic. I'll definitely reach out when we get to the implementation phase of this smoke test suite.