mkosi icon indicating copy to clipboard operation
mkosi copied to clipboard

Split up mkosi into modules

Open behrmann opened this issue 4 years ago • 11 comments

Right now mkosi weighs in at 5236 lines in a single file. While it is nice to not have to think about where a function is, it is sometimes unwieldy and it will only get more so, if more distributions and or features are added. Right now it is also easy to inadvertently break another distribution while working on some unrelated functionality.

My suggestion would be to break mkosi up into multiple modules. I've started the work on a local branch to see where it could go and a possible split might look like this

mkosi
├── cli.py                    # taking the commandline interface
├── distributions             # distribution specific function
│   ├── arch.py
│   ├── centos.py
│   ├── clear.py
│   ├── debian.py
│   ├── fedora.py
│   ├── __init__.py
│   ├── mageia.py
│   ├── openmandriva.py
│   ├── opensuse.py
│   ├── photon.py
│   └── ubuntu.py
├── image.py                  # image creation, mounting
├── __init__.py               # functions used all over the place
├── install.py                # running install and copying files to the image
├── __main__.py               # main()
├── output.py                 # make_* and friends
└── partition.py              # partioning and filesystem creation

The mkosi binary would be created via a setuptools entrypoint and the work doesn't have to be done in one go. A first step can be seen in #401 where mkosi is moved to mkosi/__init__.py and the main() is moved to mkosi/__main__.py. The rest could be done step by step.

The coupling is between the different parts of mkosi is very tight at the moment, mostly because everything is passed via the args variable. Splitting up mkosi like this (or similarly, I'm open to any sort of ordering) would be a first step to uncoupling the different parts, so that some interfaces are put into the pipeline. a Distribution class that handles the bootstrapping of an installation and advertises its features instead of hardcoding all possible variants.

Related problems and opportunities to splitting:

  • Unifying (mostly) on logging instead of printing, because (in my opinion) only the direct cli code should print to screen right away. Right now most stuff is printed to console via sys.stderr.write, although there are a few calls to print here and there. My assumption is, that this was done to flush to screen right away.
  • Adding new distributions or splitting out distributions. setuptools has a feature called namespace packages. In this way distributions could be split out and installed when needed. It would also allow for people to have their own installers without touching mkosi itself, which would make for a nicer development experience.

Possible steps:

  1. #401
  2. Splitting out cli.py and moving to logging for everything that is not in cli.py
  3. Carving chunks out of CommandLineArgs and moving related functions to those parts of the config into a submodule and / or subsume them in a class.
  4. repeat 3 until satisfied with modularity

This is quite open-ended and would be a lot of work, but I think it will pay of in the end to make the code easier to follow, more maintainable and extensible.

Is a change along these lines something that is wanted?

behrmann avatar Jul 31 '20 15:07 behrmann

Is a change along these lines something that is wanted?

+1 from my side. I think moving the distro-specific parts in their own files would be the most beneficial part. I think that retaining the ability to call mkosi directly without having to run via setuptools would be nice though.

lucasdemarchi avatar Jul 31 '20 16:07 lucasdemarchi

I think that retaining the ability to call mkosi directly without having to run via setuptools would be nice though.

This changes somewhat, with what I have in #401. It loses the ability to just call mkosi via the path to the file, if one hasn't installed it, but as long as python can find the module, that is if it's in the current working directory or PYTHONPATH points to its directory, it can be called via python3 -m mkosi (as can be seen in the CI files).

That being said, we would win the ability to to editable installs (via pip install -e) and being able to properly install it into virtual environments (e.g. for different git worktrees). Generally, it would work nicely with pip, allowing distribution via PyPI, which would make it more easily usable for new users.

behrmann avatar Jul 31 '20 16:07 behrmann

Yes, but we could still make it work without that. For example we could have a mkosi script that basically appends the dir to sys.path accordingly and run the main binary

lucasdemarchi avatar Jul 31 '20 16:07 lucasdemarchi

Sure, that's possible, but is close what would be automatically generated as mkosi in ~/.local/bin when running pip install -e . in the mkosi directory, since this mkosi would then pick up whatever happens in the mkosi directory.

If somebody really doesn't want to run pip install (for whatever reason), they could drop something like that into the directory, but I'm not quite sure that we need to ship it.

#!/usr/bin/python
mkosiroot="$(dirname $0)"
exec PYTHONPATH=mkosiroot python3 -m mkosi 

behrmann avatar Jul 31 '20 17:07 behrmann

Humn... so pip install -e actually allows to run it without installing every time. I missed that, sorry. Proposal LGTM then

lucasdemarchi avatar Jul 31 '20 17:07 lucasdemarchi

I like the idea, but maybe watch out so that you're not abstracting too much. Sure, the big file is a bit unwieldy but I find the general flow quite easy to follow at the moment.

This is just speculation, but it might be possible to reduce each distro installation to a single function that also handles the bootloader. Export that function from each distro file and import shared stuff as necessary to avoid duplication (dnf, grub, pam_securetty, gpg, ...). I'm not even sure we need a class at all.

As long as you implement this in small steps, it sounds like a great idea.

DaanDeMeyer avatar Jul 31 '20 20:07 DaanDeMeyer

Related to this is how another distribution is supposed to get added to the existing tooling. Now perhaps the answer is "just read the code and try", or perhaps there really should be some clarification on what should be built out and put where. This would need to cover tests too.

I'd like to add Rolling Rhino into the mix as it isn't quite Ubuntu (you supposedly shouldn't just use apt). A documented approach to this would be nice, even if it is my own fork and never wanted upstream. If mkosi was modular then this would already be a lot clearer.

Perhaps it's incentive for me to assist on modularity.

mcassaniti avatar Jul 05 '22 08:07 mcassaniti

@mcassaniti Happy to help in adding rolling rhino! When the time comes a separate Issue or PR for this would be great.

The minimal steps to add a new distribution are:

  • Add it to the Distribution enum in backend.py
  • Add a install_foo function for your distribution Foo to __init__.py
  • Maybe add sanity checks and or defaults to load_args in __init__.py

There's more points to hook into more features of mkosi, but this should basically be the minimum code changes needed. #816 is a recent such change that you can look at.

behrmann avatar Jul 05 '22 08:07 behrmann

I do think we should split the code, but the existing split into backend.py and __init__.py is IMO a mistake: it cuts in the wrong place, and many things that are logically connected are separated into the two files. If you want to add a distro that it similar to an existing one, just do it by adding similar functions in existing files, like it was done for more distros. Separation of code can and should be done in a separate PR.

keszybz avatar Jul 05 '22 20:07 keszybz

Indeed, the current split is more of a hodge podge and the name backend.py is unfortunate as it's more core or util, but even with a more careful split this will likely look somewhat similar because of Python's import system, since the imports cannot have cycles (at least until lazy imports become a thing or by doing imports dynamically).

behrmann avatar Jul 05 '22 20:07 behrmann

I'm happy to do a separate PR for rolling rhino and I had no intentions of anything different. Can I suggest separate classes for each distribution which would allow the core workflow to call the distribution specific code at the right time?

mcassaniti avatar Jul 05 '22 23:07 mcassaniti

Let's close this as we did an initial split

DaanDeMeyer avatar Feb 14 '23 14:02 DaanDeMeyer