dissect.target icon indicating copy to clipboard operation
dissect.target copied to clipboard

Migrate to monorepo

Open twiggler opened this issue 5 months ago • 0 comments

Introduction

Currently, the dissect project is split over thirty-something repositories. The disadvantages of this are:

  1. Cross-project refactoring and features are very cumbersome, involving PRs on multiple respositories.
  2. Refactoring regularly break downstream projects.
  3. The project is hard to manage

Solution

One solution to these problems is to convert a monorepo, with the following directory layout:

project-root/
├── dissect.cstruct/
│   ├── pyproject.toml
│   ├── dissect/
│   │   └── cstruct/
│   └── tests/
└── dissect.target/
    ├── pyproject.toml
    ├── dissect/
    │   └── target/
    └── tests/

Note: Although all the projects are now in a single repository, individual projects such as dissect.cstruct can still be published.

Now, we gain atomic cross project commits. Further fairly easy to run the unit tests of dissect.target against a development verrsion of dissect.cstruct

Research

There are multiple build systems available such as

Some of the key roles of a Build System in a Monorepo are:

Dependency Management: At its core, a build system understands the intricate web of dependencies between all the projects and packages within the monorepo. It builds a "dependency graph" to know what needs to be built or tested when a change is made.

Task Orchestration & Unified Tooling: It provides a single, unified way to run tasks like build, test, lint, or format across different projects. You can execute commands from the root of the repository without needing to navigate into individual project directories. This ensures that the same tooling and configurations are consistently applied everywhere.

Performance and Scalability 🚀: As a monorepo grows, keeping build and test times low is crucial. Build systems achieve this through:

  • Caching: It avoids re-running tasks (like builds or tests) that have already been completed for the same code, retrieving the results from a local or remote cache instead.
  • Parallel Execution: It leverages the dependency graph to run independent tasks in parallel, significantly speeding up workflows.
  • Affected-Based Execution: It identifies only the projects that are actually affected by your code changes and runs tasks exclusively on them, saving computation time.

Consistent and Reproducible Environments: The build system guarantees that every build and test runs in a consistent environment with the same versions of dependencies and tools.

While this is nice, the question is if we actually need a build tool, since the dependency graph is pretty trivial. In any case, @twiggler has made a POC using Pantsbuild for a couple of key repositories. It would be nice to compare and contrast with other solutions

twiggler avatar Jul 09 '25 12:07 twiggler