material-ui icon indicating copy to clipboard operation
material-ui copied to clipboard

[code-infra] Move Link to @mui/docs

Open Janpot opened this issue 1 year ago • 13 comments

Next step in https://www.notion.so/mui-org/mui-docs-02a0476bcb2d4e298e8540fd2df5a137

This creates a DocsProvider which intends to store things host specific things for docs components. For now it only contains the docs/config and the UserLanguageProvider.

  • moved i18n to docs package
  • moved Link to docs package

supersedes https://github.com/mui/material-ui/pull/40856

  • [x] Integration on X: https://github.com/mui/mui-x/pull/11880

Janpot avatar Feb 01 '24 13:02 Janpot

Netlify deploy preview

https://deploy-preview-40889--material-ui.netlify.app/

Bundle size report

No bundle size changes (Toolpad) No bundle size changes

Generated by :no_entry_sign: dangerJS against b41c6330b6677536625832a4d89d2b179f39bc91

mui-bot avatar Feb 01 '24 13:02 mui-bot

@mui/code-infra I love the idea of isolating what's docs-infra and what's docs into separate folders in https://github.com/mui/material-ui. For example the objective with https://github.com/mui/material-ui/tree/master/docs/public/static/docs-infra. I think it rings with 2 ⬇️, having correct abstractions to know what impact what.

However, when I look at https://www.notion.so/mui-org/mui-docs-02a0476bcb2d4e298e8540fd2df5a137, and @mui/docs, there are two things I'm missing:

Why work on @mui/docs now? As far as I understand the problem, we want Base UI to have its repository:

  1. The pragmatic thing sounds to me to reproduce what MUI X and Toolpad did. We need to do it for MUI Connect/Design kits as well. I don't expect this to be a lot of work, a week? Doing this first sounds valuable, not creating abstraction until we experience the duplication pain.
  2. Then, the immediate problem we will have is code duplication. So we should create correct API abstractions, to remove all the duplication. For example to fix the static files cross-dependency issue.
  3. Then, the highest priority would be to evolve the docs-infra, we have a good chunk of performance issues, a Next.js App Router migration to do, a theming differentiation to create, and a likely migration to a subdomain.
  4. Then, the highest priority might be to isolate the npm dependencies, though we have been living with it for 3 years now, I don't recall this lead to a lot of problems.

Why this direction with @mui/docs?

  1. Is an npm package the right solution? As far I understand the problem, this https://github.com/mui/mui-x/blob/e50aae8ae87e63cf67a67b2da5ffad16d8cf9651/package.json#L89 is not great because we have no isolation of the npm dependencies with yarn. How far did the discussion on adding https://github.com/mui/material-ui as git submodules go? I mean, reproducing Hashicorp setup: https://github.com/hashicorp/vault/blob/main/website/scripts/website-start.sh#L30. Why would this not be simpler? We wouldn't need to publish, we wouldn't need to build, only import and have the transpilers do their magic.
  2. Why not enforce the code to move in sync with a single npm package? Assuming we can't use git submodules. Why have multiple npm packages, e.g. @mui/docs? Having a single @mui/internal npm package and putting everything inside force us to keep everything in sync, it's a forcing function to prevent making large technical debt "debits". Large breaking changes would be spotted sooner. Are we at a scale where we need to migrate code chunk by chunk? I doubt it, if we were 100 engineers, it would make more sense to me. I think that file should be the smallest level at which we support code evolution, and the folder the smallest level at which we support code segmentation, not npm packages. Need to migrate from React 17 to 18 on a massive codebase? you install both and migrate one file at a time.

In practice, it could look like this:

import { Bar } from ***@***.***/internal/docs-infra/foo/baz';

oliviertassinari avatar Feb 05 '24 07:02 oliviertassinari

Not going to comment here on the why now (cc @mnajdova @michaldudak), but I can already comment on some of the points made.

  1. The pragmatic thing sounds to me to reproduce what MUI X and Toolpad did. We need to do it for MUI Connect/Design kits as well. I don't expect this to be a lot of work, a week? Doing this first sounds valuable, not creating abstraction until we experience the duplication pain.

We've made this exact argument when we built the Toolpad docs, and we felt the duplication pain. The pains we encountered are dependency management and collaborating while making breaking changes to the setup. The duplication is described in the notion doc.

  • We're attempting to solve collaboration through the code infra team as a framework for better communication between the teams regarding infra.
  • We attempt to solve the dependency problem through publishing internal packages.
  • We attempt to solve the duplication through creating a @mui/docs package that contains common components (demos examples, _app, _document, Link,...) and a next.js plugin that sets up a common runtime.

We can do several more repositories like this, but there is always a risk that we further bifurcate the setup. We may end up with every team doing their own docs/code infra again. We already have certain differences between Toolpad and X that adds challenge to extracting the @mui/docs package. This will only increase the more teams adopt this pattern.

For example to fix the static files cross-dependency issue.

This is a problem of implicit dependencies as well. One way it can be solved is by making them explicit. Use an image loader in the build pipeline and add them in the @mui/docs package as javascript imports and nobody will ever have to think about copying them over or invalidating their cache anymore. If you use the @mui/docs package as a vehicle for sharing deduplicated code, there is a case to be made to do it first.

  1. Then, the highest priority would be to evolve the docs-infra, we have a good chunk of performance issues, a Next.js App Router migration to do, a theming differentiation to create, and a likely migration to a subdomain.

Migrations could become less challenging when there are strong contracts between the different repositories. The published packages are a stronger contract than randomly importing nested folders of a nested workspace of a github dependency.

  1. Is an npm package the right solution? As far I understand the problem, this is not great because we have no isolation of the npm dependencies with yarn.

You'll have to clarify what you mean with isolation. The main problems I see with dependencies in our current setup:

  • We write code in core docs that depend on packages installed in core docs. These aren't automatically installed in X/Toolpad. Therefore there is no guarantee the dependency exists, and if it exists that it's the correct version. And even if it's the correct version, I believe there is no guarantee it's being loaded from the same folder. You say "I don't expect this to be a lot of work, a week?". I've spent a multiple of this time dealing with these dependency problems.
  • We write code in the core docs that depends on the core docs packages as they are in that commit. In Toolpad and X that same code uses packages as they are installed in the node_modules folder. This means that the core docs sometimes uses APIs from @mui/styles that aren't yet present in Toolpad/X setup. We had monorepo broken because of that for a week only last week. An explicitly published package can define which dependencies it's supposed to work with.

I've spent a considerable time dealing with these issues over the past 2 years. The issues always come unexpected, and when monorepo becomes unmergeable it affects everyone operating on this interface between core and the other repositories.

I mean, reproducing Hashicorp setup. Why would this not be simpler? We wouldn't need to publish, we wouldn't need to build, only import and have the transpilers do their magic.

Because the magic becomes a whole lot less magic if all you're doing is generating static html from static markdown. it's a pure function that doesn't depend on environment. Our use case involves including highly dynamic content in our markdown (demos/playgrounds). This content is more than just md files, it includes javascript files and their dependencies. javascript that has to live partly in different places: partly in the docs folder and partly in the docs infra. package.json dependencies are a great way to create a strong contract between a javascript file and its dependencies. But that only works if you install them correctly.

  1. Why not enforce the code to move in sync with a single npm package?

yep, I'm in favor for larger packages as well. Though from an organizational perspective I think one single one is likely too many unrelated concerns together, I've always advocated for a @mui/scripts package and an @mui/docs package, one for each team of code infra and docs infra. This allows for those teams to operate a bit more independently and reduces the need for syncing updates across teams.

Janpot avatar Feb 05 '24 10:02 Janpot

Not going to comment here on the why now (cc @mnajdova @michaldudak), but I can already comment on some of the points made.

On the weekly when this was discussed we agreed to do it this way - we even mentioned that each change will be tested both in X and Toolpad to make sure the abstractions are done right. This is what I remember, but probably @michaldudak can answer with more details.

mnajdova avatar Feb 05 '24 10:02 mnajdova

Why not enforce the code to move in sync with a single npm package?

I am the supporter of multiple smaller packages here.

Having several fine-grained packages scales better and allows us to move faster as an organisation. The current setup with the monorepo dependency can be thought of an equivalent of one big package dependency. Numerous times I could not test if introducing a change in one of the the shared utils works in the X repo, because some completely unrelated code hasn't been integrated yet and the latest master did not build cleanly. Having code in separate packages allows me to iterate on it without waiting on unrelated modules. This follows the UNIX philosophy of "making each program (package in our case) to do one thing well", which I believe in.

why now

Because as we're introducing yet another repository, I would like to avoid generating more technical debt right from the start. As @Janpot pointed out, the current setup requires overhaul, but I can understand there's never a good time to make such significant changes. Introducing a new repository was a good motivation to finally take care of this.

michaldudak avatar Feb 06 '24 13:02 michaldudak

We've made this exact argument when we built the Toolpad docs, and we felt the duplication pain.

@Janpot OK, I can see this. We could say: we inverse steps 1 and 2 because we have enough "materials" to work with, MUI X and Toolpad are enough. With this in mind:

  • 1 Solve code duplication
  • 2 Reproduce what MUI X and Toolpad did
  • 3a Evolve the docs-infra
  • 3b Isolate the npm dependencies (what this PR seems to do)

This is a problem of implicit dependencies as well.

I'm not sure it's so much about 3b. If we solve the CSS file duplication by migrating to Next.js App Router, it looks 3a. If we solve this by having a script that copies the source, it looks 1.

Migrations could become less challenging when there are strong contracts between the different repositories.

Agree. What I see, is that we might be able to get 80% of the value by working on 1: creating a dedicated docs-infra folder to host that logic. We might not need to go as far as introducing an npm package yet.

I've spent a considerable time dealing with these issues over the past 2 years. The issues always come unexpected

Ok, so when I said "we have been living with it for 3 years now", I actually truly only experienced 1 of these 3 years. The difference might have been that I know about all the changes in core because I merged or reviewed them so when I saw them break x, it felt simple to fix.

Because the magic becomes a whole lot less magic if all you're doing is generating static html from static markdown. it's a pure function that doesn't depend on environment. Our use case involves including highly dynamic content in our markdown (demos/playgrounds).

I don't know, to me it feels like we would trade mui-x/@node_modules/@mui/monorepo/source-of-main-repo for mui-x//@mui/monorepo/source-of-main-repo, the only difference being that we can run pnpm twice.

The downside seems to be that contributors of MUI X would have more to do than git pull & pnpm i for their environment to be set, they would also need to run a third command that syncs.

The upside seems to be no npm publish, but to be fair a continuous CI step that releases each npm version sounds enough, but something we have to maintain.

Ok, not as much as a clear win then? The pain of community contributions frightens me. I contributed to repositories in the past where it was more than these two commands, pretty bad UX.

I've always advocated for a @mui/scripts package and an @mui/docs package, one for each team of code infra and docs infra. This allows for those teams to operate a bit more independently and reduces the need for syncing updates across teams.

I think we should handle changes in the infra so that no synching is required, but small continuous updates work. I see one value in having @mui/docs as its own thing: the potential to provide a https://nextra.site/ like an open-source project. But not sure we should or ever will, so not a strong reason.

Numerous times I could not test if introducing a change in one of the the shared utils works in the X repo, because some completely unrelated code hasn't been integrated yet and the latest master did not build cleanly

@michaldudak This is the value I see on my end with a single npm package: back pressure: As I understand the example of the pain: someone did too many debt debits, she/he changed an API without taking into account the bandwidth of the dependents to handle them. It makes us slow to handle this debt, it wouldn't go unnoticed, either we will learn it's too much and revert, to push through it and learn for the next time;


Ok, so from all of this, as long as these are reflected on, 👍:

  • Should the npm package be done first? It's not the biggest pain I have experienced. But I could be in the minority, taking into account the pool of people who worked on the docs-infra/code-infra, it's not so clear, so OK.
  • Do we need two npm packages? I see a lot of value in back pressure but I also resonate with Highly Aligned, Loosely Coupled, It feels too early but we will get there, so OK.
  • npm vs. git-submodule. The contribution experience for a community member doesn't sound amazing, if we make the npm publish experience seamless then 👌

oliviertassinari avatar Feb 06 '24 20:02 oliviertassinari

OK, I can see this. We could say: we inverse steps 1 and 2 because we have enough "materials" to work with, MUI X and Toolpad are enough.

Just want to add to this that I don't see these exactly as 100% discrete steps. i.e. to do 3 you need to do some of 1 as well and vice versa. For me it's more like whether we do exactly what Toolpad/X did, or we start applying some or all of the lessons we've learnt from these experiences. And these are both valid strategies imo, it's a matter of choice.

creating a dedicated docs-infra folder to host that logic.

but to me, a npm package is a dedicated docs-infra folder that hosts that logic. It just has the added feature that it can automatically set up external dependencies for logic in that folder. I also want to add that from a core perspective @mui/docs is a package and has its dependencies defined. X and Toolpad will still include it as a subfolder from their monorepo dependency for a while, so effectively with this PR, we're still only solving 1. We get to 3 once it has all the logic to be able to live on its own. This PR just specifies the dependencies because it sets up a correct package structure from the start because it helps making typescript and linters happy.

Agree. What I see, is that we might be able to get 80% of the value by working on 1: creating a dedicated docs-infra folder to host that logic. We might not need to go as far as introducing an npm package yet.

The package is that folder, it already was, I'm just starting to formalize it more by moving more into it. It's already published, but neither X or Toolpad use that, they load it from their monorepo dependency instead, as a folder.

The benefit of moving things into this package is to make it clear in core repo what is shared logic. And where you need to be mindful about introducing breaking changes. That signal currently isn't there and it's a useful one.

I don't know, to me it feels like we would trade mui-x/@node_modules/@mui/monorepo/source-of-main-repo for mui-x//@mui/monorepo/source-of-main-repo, the only difference being that we can run pnpm twice.

The downside seems to be that contributors of MUI X would have more to do than git pull & pnpm i for their environment to be set, they would also need to run a third command that syncs.

The upside seems to be no npm publish, but to be fair a continuous CI step that releases each npm version sounds enough, but something we have to maintain.

I have a hard time following what you're trying to communicate. Can you elaborate more on this section?

We're a javascript shop. task one in any repo we have will always be pnpm install. locally after checking out, in our CI, in netlify,... a npm package is a unit of organizing code that also specifies external dependencies. But that only works if we correctly integrate that package in the workflow, which we currently don't do. If I read between the lines, I'd suspect your aversion seems to be the fact that we have to npm publish, I understand that concern, I have the same, I started thinking more about it and came to the following conclusions:

  • we're currently already publishing our packages on a weekly basis, we can publish our internal ones alongside without extra work, we can omit them from the changelog, maybe I'm missing something?
  • we're currently updating monorepo on a weekly basis in X/Toolpad. We can do the same for the published packages, or better: delegate that task to renovatebot. This makes publishing on a weekly basis enough. If not, we can always add a separate script to only publish the internal packages, Or use a codesandbox build from the master branch.
  • we can use codesandboxci to test changes on dependent repositories before merging them.
  • and if all of this still turns out to be too cumbersome, pnpm just added support for subfolder dependencies. There is always a road to go back to github dependencies while keeping some of the dependency resolution problem solved.
  • npm packages are a concept that is very well understood by our maintainers and community.

It doesn't feel to me like we are giving up that much in practice to get back a setup that scales better because of better boundaries and dependency handling.

I see one value in having @mui/docs as its own thing: the potential to provide a https://nextra.site/ like an open-source project. But not sure we should or ever will, so not a strong reason.

To me that is what it is, our own private nextra, adapted to our own use case.

This follows the UNIX philosophy of "making each program (package in our case) to do one thing well", which I believe in.

@michaldudak It's a good practice, but the unit of encapsulation needs to be adapted to the scale of what you're operating at. It can be single functions, single javascript files, npm modules, github repositories,... The unit that we choose can evolve over time. Personally I feel like that our scale puts us somewhere between individual files and npm modules on that spectrum. triple the team size and we may as well be squarely into npm packages as a unit of encapsulation. For that reason I feel like 1 on or 2 packages still makes sense as opposed to making a specific package for each and every concern. But I admit, there is a lot of intuition involved in this reasoning, I just think we need to be careful being dogmatic about these sort of philosophies.

Ok, so from all of this, as long as these are reflected on, 👍:

I feel like we are relatively close aligned and our main differences come from concerns around introducing overhead by publishing.

I just want to also add that in an organization that grows, extra overhead in collaborating well is to be expected. You can't have the same amount of overhead in a 30 people org vs a 3 people org.

Janpot avatar Feb 07 '24 09:02 Janpot

This is the value I see on my end with a single npm package: back pressure: As I understand the example of the pain: someone did too many debt debits, she/he changed an API without taking into account the bandwidth of the dependents to handle them. It makes us slow to handle this debt, it wouldn't go unnoticed, either we will learn it's too much and revert, to push through it and learn for the next time;

From my experience, this doesn't scale well. The more we grow, the more we should operate as independent teams utilizing the producer/consumer pattern. The code infra team releases packages that are consumed by the product teams when they have the need and bandwidth for it. Having smaller packages makes it easier. To be clear - I don't mean we should silo ourselves and work in isolation - it's still possible (and desirable) to help other teams and ensure our changes don't cause problems. Since we're not a single product team, we shouldn't try to act as one.

Personally I feel like that our scale puts us somewhere between individual files and npm modules on that spectrum. triple the team size and we may as well be squarely into npm packages as a unit of encapsulation.

@Janpot, could you please help me understand your reluctance to use smaller packages today?

To me, the advantages outweigh the drawbacks:

  • + easier to consume (updates don't involve unrelated changes)
  • + easier to reason about during authoring
  • + clear changelog
  • + independent technical choices for packages (for example, we can set type: module in one package without affecting others)
  • - more work for publishing
  • - potentially unsynced versions between repos (is it any more of a problem than unsynced external dependencies?)

To be clear: I don't aim to put every utility in a separate package. I'm fine with having the scripts that currently live in the root scripts folder as a single package. But when a utility grows beyond one or two files, it's a good candidate for extraction to me.

michaldudak avatar Feb 07 '24 12:02 michaldudak

@Janpot, could you please help me understand your reluctance to use smaller packages today?

reluctance is such a negative word, It's a preference 🙂. a non-exhaustive list of reasons why:

  • It's a forcing function that reduces the incentives to make breaking changes and forces us to think about how to make changes in a backwards compatible way instead. e.g. where a breaking change is done in a two step process, with a compatibility mode in between. This is a strategy that scales much better as we increase repositories. And it guarantees better that we don't end up with that one repository where some forgotten package stays forever on an old version. Or differently, we don't need to facilitate making breaking changes, we need to make fewer breaking changes. Example from this PR: I extract i18n, but shim the old module as long as there is code out there that uses it. Even if I'm intending to immediately update downstream.
  • It's about dependency management. e.g. it's easier to share code between lint scripts if they are in the same package. Otherwise if you want to avoid circular dependencies between packages you always have to create a third package that contains the shared code. It's a combinatorial problem, the more packages you create the more subsets there are of packages that need to share code. And that relation doesn't grow linearly, that is the problem.
  • Our organization doesn't demand for it yet. code infra is a small team and it will likely still stay that way for quite a while.
  • I tend to find it easier to build a monolith and have it reveal the required architecture than to come up with it beforehand. It's a pragmatic approach to software architecture.
  • will add more as I think of them...

But when a utility grows beyond one or two files, it's a good candidate for extraction to me.

I don't believe line/file count is a good a metric to base these decisions on. We should abstract where abstractions are needed and organization would be improved by it. Especially 1 or 2 files per package gives me left-pad vibes.

If we do multiple packages, maybe it makes sense to put in writing first which concerns we want in a separate package?

Janpot avatar Feb 07 '24 13:02 Janpot

And it guarantees better that we don't end up with that one repository where some forgotten package stays forever on an old version

I agree that a single large package reduces this possibility. It's worth noting, however, that we use Renovate, which will act as a reminder. Besides, the repository may not need the latest package at all. I agree that having the same version of infra packages across the whole organization is more convenient (especially for the infra team), but is it a strong requirement?

It's a forcing function that reduces the incentives to make breaking changes and forces us to think about how to make changes in a backwards compatible way instead. e.g. where a breaking change is done in a two step process, with a compatibility mode in between.

How does a single package support this scenario better? Is there anything that prevents us from using this two-step process with multiple packages?

it's easier to share code between lint scripts if they are in the same package

:+1: That's true. So far, I had to create one package that's used solely to support other packages (docs-utils)

Our organization doesn't demand for it yet. code infra is a small team and it will likely still stay that way for quite a while.

How does the size of the infra team correlate with the content of the packages?

I tend to find it easier to build a monolith and have it reveal the required architecture than to come up with it beforehand. It's a pragmatic approach to software architecture.

Haven't we used the monolith for the past couple of years?

If we do multiple packages, maybe it makes sense to put in writing first which concerns we want in a separate package?

:+1: Good point

michaldudak avatar Feb 07 '24 13:02 michaldudak

I'm making soft arguments. None of them are blocking. You asked me to explain my preference, and these are the reasons why I have this preference 🙂. The main point is, when in doubt I tend to err on the side of creating fewer packages rather than more, but never dogmatic (e.g. two is a number based on my limited understanding), always in function of what's needed for our use-case. With this discussion and the one during code infra meeting, I feel like we're gaining consensus on what we want near future code infra to look like.

Janpot avatar Feb 13 '24 12:02 Janpot

I updated the proposed list of packages in the shaping doc on Notion and started working on the internal-scripts package (#41079)

michaldudak avatar Feb 13 '24 13:02 michaldudak

This establishes the @mui/docs package, will move more components incrementally in follow-up PRs

Janpot avatar Feb 13 '24 16:02 Janpot

a npm package is a dedicated docs-infra folder that hosts that logic

@Jan Yes, however, making it a package.json bring extra overhead. As I see this: we need to define each dependency. We need to resolve circular dependencies, we need to batch renovate dependencies updates. What we get in exchange seems slim. When I see feedback like https://news.ycombinator.com/item?id=15889691, I read that the right play is to diehard on being as close as possible to one codebase, each change being atomic and org-wide. I see more npm packages as a way to share sources than binaries.

I have a hard time following what you're trying to communicate. Can you elaborate more on this section?

I'm saying that we could use git submodules to solve the same problem that using npm packages solves today. But it comes with an extra complexity for contributors. It's no longer:

  • pnpm i
  • pnpm docs:dev

to get a working environment. With git submodule, the community/maintainers would need a third build step. Now, to be fair. The zero runtime introduces the need to build, if we can't remove it, we have to pay the cost of 3 steps anyway.

we're currently already publishing our packages on a weekly basis, we can publish our internal ones alongside without extra work

Disagree, it doesn't sound good enough, I think it should be for each commit, or at worst daily. We need to be able to do bisections when something goes wrong to identify the commit to revert.

and if all of this still turns out to be too cumbersome, pnpm just added support for subfolder dependencies

This sounds great. The only downside I can think of is that when we use a GitHub repository as a dependency, we download the git history at the same time, so it's not so fast to install. But otherwise 👌

The code infra team releases packages that are consumed by the product teams when they have the need and bandwidth for it

@michaldudak The goal of having one person full-time for code-infra & docs-infra in 2024 is to have time to go after larger initiatives and to own the whole org. So I would expect migrations to be mostly handled by the two people in the role. I guess that with more scale, there will be no other way than to request team help.

In any case, the fundamental constraint I want to enforce is that most changes that docs-infra/code-infra are doing should be backward compatible. We WON'T (in the majority of the time) get this backward compatibility feature from using older npm versions but by keeping old API work until no longer depended on. Each time we don't update, we fail the primary objective to have a single codebase, we are out of sync with HEAD.

oliviertassinari avatar Feb 18 '24 23:02 oliviertassinari

I'm saying that we could use git submodules to solve the same problem that using npm packages solves today.

You need to shape this submodules idea in a more detailed way. I really don't see the architecture you have in mind. Do you suggest we do:

# ./.gitmodules
[submodule "monorepo"]
	path = monorepo
	url = https://github.com/mui/material-ui

to create the following file structure with a nested workspace root?

mui-x\
  docs\
    pages\
      getting-started.tsx
      _document.tsx
    package.json
    next.config.mjs
  node_modules
  monorepo
    docs\
      pages\
        getting-started.tsx
        _document.tsx
      package.json
      next.config.mjs
    packages\
      material\
        package.json
    package.json
  packages\
    x-data-grid\
      package.json
    x-charts\
      package.json
  package.json

This setup is essentially the same as our current setup. All this does is replace ./node_modules/@mui/monorepo/ with ./monorepo/ everywhere. It solves none of the dependency problems that we have.

When I see feedback like https://news.ycombinator.com/item?id=15889691, I read that the right play is to diehard on being as close as possible to one codebase, each change being atomic and org-wide. I see more npm packages as a way to share sources than binaries.

I think you are overlooking the following sentence: "- Blaze. Much like Bazel. This is the system that defines how to build various artifacts. All dependencies are explicit, meaning you can create a true dependency graph for any piece of code. This is super-important because of..." this touches the crux of the problem. There is no dependency graph in the submodules setup, nor is there in our current setup. The root of our problems is dependencies, not colocation of code files. The maintainability that comes from monorepos is atomic changes, to the individual files, but also to the dependency graph as a whole.

Janpot avatar Feb 19 '24 11:02 Janpot

All this does is replace ./node_modules/@mui/monorepo/ with ./monorepo/ everywhere. It solves none of the dependency problems that we have.

@Janpot My assumption is that from there, we can configure part of /monorepo/ in the MUI X pnpm workspace, have pnpm install the monorepo dependencies.

There is no dependency graph in the submodules setup, nor is there in our current setup. The root of our problems is dependencies, not colocation of code files.

My assumption is that with the submodule path, we would continue to create packages, but not consume them through an npm registry.

oliviertassinari avatar Feb 19 '24 14:02 oliviertassinari

@Janpot My assumption is that from there, we can configure part of /monorepo/ in the MUI X pnpm workspace, have pnpm install the monorepo dependencies.

Yes, great answer, thats what I assumed as well, I've been playing with this in Toolpad before. So you'd update the pnpm-workspace.yml to

packages:
  - 'packages/*'
  - 'docs'
  - 'test'
  - 'monorepo/packages/*'

right? So that all the core workspaces become X/Toolpad workspaces as well. Problems that I can think of that need to be anticipated or resolved:

  • I don't expect this setup to be high on the priority list to be supported by package managers. Expect open tickets that never get resolved.
  • this will install dependencies for all core packages in X/Toolpad.
  • in core no workspace can implicitly depend on the workspace root dependencies anymore as these won't be installed in X/Toolpad.
  • If X docs depend on core docs with specifier workspace:* and core docs depend on @mui/material: 'workspace:*' then how do we avoid duplicate dependencies if we don't specify @mui/material: 'workspace:*' in e.g. the datagrid. And in case of the latter, what does our publishing flow look like for X? Surely we don't want X to publish the core packages as well? Also now MUI X is using an unreleased @mui/material, when does it publish? Does it only use release commits of the core repo? and what does it rewrite those @mui/material: 'workspace:*' specifiers to? Also, codesandboxci will have to publish a version of all the packages (core+X) then.
  • We must be mindful about duplicate names of packages, especially with internal or test packages. e.g. what if core introduces a tests package that already exists in one of the dependent repos.
  • This setup still encourages using the package format for declaring dependencies. I think that's a great thing and something we need to take as part of any setup going forward.
  • We will still have to set up all the aliasing that exists in the core repo and keep it in sync between repos.
  • Likely will have to migrate X to yarn before we can use this setup as we can't really mix yarn/pnpm workspaces
  • We're tightly coupling all our repositories to the same package manager, and probably at some point even to a subset of compatible versions of that package manager.

I'm not rejecting this but for me:

  • There already is a strong contract in @mui/docs, it is tightly coupled to a @mui/material version (because workspace:*) it makes sense to me that we make it part of the core release cadence.
  • We have codesandbox ci published versions for canary testing.
  • We could automate publishing master to npm for each merge under its git SHA as a version tag. This allows us to bissect. Another way to bissect would be to use a file dependency to a locally checked out core repo with pnpm injected
  • I'm not claiming this setup doesn't create overhead, it definitely does. This setup treats the infra as our public packages, albeit without semver constraints. It couples the monorepo update to the core packages update to guarantee consistency. It trades off the flexibility that exotic setups create for stronger guarantees.

My assumption is that with the submodule path, we would continue to create packages, but not consume them through an npm registry.

Ok, so the concern really centers around the "publishing to npm registry". Does that mean we can move forward with this PR and have a @mui/docs package and continue exploring on the mechanism of distribution in parallel? This PR is not depending on publishing at all, X/Toolpad are still consuming it through the @mui/monorepo dependency

Janpot avatar Feb 19 '24 15:02 Janpot

I don't expect this setup to be high on the priority list to be supported by package managers. Expect open tickets that never get resolved.

@Janpot I don't get this one. Form the perspective of pnpm, it looks like a normal folder. But ok, maybe there will be bugs when doing a git pull.

this will install dependencies for all core packages in X/Toolpad.

Unless we have two separate top level folders. One for mui/material-ui packages, one for the infra.

We must be mindful about duplicate names of packages, especially with internal or test packages. e.g. what if core introduces a tests package that already exists in one of the dependent repos.

This could help, e.g. I have VS Code configured to have all MUI code when searching/jumping to files.

depend on core docs with specifier workspace:*

We don't have to use the explicit workspace:* syntax. We could set a regular version dependency. If it's in the workspace ppm resolves it locally, if not it goes through npm

We could automate publishing master to npm for each merge under its git SHA as a version tag. This allows us to bissect.

I'm not aware of how pnpm inject works. I don't see how depending on the weekly release is an option. For example, if I really have to make a breaking change to docs-infra, as soon as I merged my change, i should be able to go to MUI X to make the change on their repository (or get the buy-in from someone in that team who support the change and is happy to make it happen)

Ok, so the concern really centers around the "publishing to npm registry".

Yes, concerns are:

  1. We should make it hard to use older versions of the code alongside new versions, so by design, we encourage atomic steps, nonbreaking changes, and mass codebase updates. I see npm packages as making it easy to use older versions, it allows entropy to increase (this doesn't mean we have to couple everything, we can allow duplication of code).
  2. We should make it easy to delete code, move code around, and refractor. I have always experienced packages as creating friction to make changes. So packages have to be giving us something in exchange.

Does that mean

Yes, I'm happy with the conclusion of https://github.com/mui/material-ui/pull/40889#issuecomment-1930723619 (I don't know if the git submodule is better, but to weigh the pros & cons)

oliviertassinari avatar Feb 19 '24 16:02 oliviertassinari

From the perspective of pnpm, it looks like a normal folder. But ok, maybe there will be bugs when doing a git pull.

It looks like a folder that looks like a workspace root but isn't used as one, with a pnpm lock file and a pnpm-workspaces.yml. The folder containing nested packages that all were written with the expectation to be installed under a different workspace root. And which will try to install in a workspace root higher up, using a lock file higher than the first one directly encountered by traversing up. I can imagine a package manager getting confused.

Unless we have two separate top level folders. One for mui/material-ui packages, one for the infra. ... We don't have to use the explicit workspace:* syntax. We could set a regular version dependency. If it's in the workspace pnpm resolves it locally, if not it goes through npm

  • But then this folder can't contain workspace:* dependencies except when pointing to workspaces inside of that same folder. This needs to be well documented and ideally also enforced somehow.
  • None of these packages can implicitly depend on workspace root dependencies or scripts. They'll need to declare their own dependencies.
  • None of these packages can also ever be published, because that would effectively have them published by two different repositories.
  • It still permits docs infra to use unreleased features in @mui/material, and so unfortunately it doesn't solve the problem that we encountered in https://mui-org.slack.com/archives/C042YB5RB3N/p1706604203702089.

Ok, I'm open to try out this setup, we can always move to publishing packages when we find that this doesn't scale well. @michaldudak thoughts?

Janpot avatar Feb 19 '24 17:02 Janpot

I don't see creating packages as friction. After the initial setup, authoring a package shouldn't be different than working in a subdirectory. The points raised as drawbacks ("we need to define each dependency. We need to resolve circular dependencies, we need to batch renovate dependencies updates") I see as advantages - we have a clearer dependency tree.

I agree with @oliviertassinari that they should be released as soon as possible, preferably after each merged PR. There is a way to automate this process completely (along with changelog creation, with something like conventional commits that Lerna supports). Unfortunately, with our current setup, this is somewhat problematic (I haven't found a way to release just a subset of workspace packages with lerna version and lerna publish).

One big advantage of using packages, that I believe wasn't mentioned here is that they share code that's ready-to-use. We don't need to set aliasing, bulid orchestrators, etc. We simply use the built output of the package, and it works no matter if we call it from TS, JS, node or a web app. Additionally, this can save us CI credits we don't need to spend on building the package each time an unrelated piece of code changed.

I'd like to point out that at the meetings during the last retreat, we collectively agreed to share code using npm packages, and if my memory serves me well, this decision was welcomed with enthusiasm by people who work with infra on a daily basis. So, I'd like to propose that we stick to the original plan, see how it performs in action, and then adjust if necessary.

I don't know if the git submodule is better

Having worked with submodules in the past, I'd advise not to go this way, especially in an open-source project. They require more discipline and are harder to grasp. The ycombinator comment you mentioned seems to agree: "I've honestly not heard anyone say anything good about Git submodules."

michaldudak avatar Feb 19 '24 18:02 michaldudak

@oliviertassinari keep up the good work.

jan avatar Feb 19 '24 18:02 jan

The points raised as drawbacks ("we need to define each dependency. We need to resolve circular dependencies, we need to batch renovate dependencies updates") I see as advantages - we have a clearer dependency tree.

Agree, most things are a tradeoff, it depends on what's valued. Reminds me https://youtu.be/0JBOSmRo8js?si=UqYCn7X7bAYL-Pvh&t=240

oliviertassinari avatar Feb 20 '24 20:02 oliviertassinari

An alternative to git submodule: https://github.com/ingydotnet/git-subrepo, I have seen it used in https://github.com/ag-grid/ag-website-shared.

oliviertassinari avatar May 26 '24 22:05 oliviertassinari