mkosi icon indicating copy to clipboard operation
mkosi copied to clipboard

Optimize kmod.py::resolve_module_dependencies()

Open LaurenceKiln opened this issue 4 months ago • 11 comments

Instead of retrieving the modinfo for all modules on the system (around 4700 modules for 6.17.8-300.fc43.x86_64), retrieve only the modules specified by the user, and then transitively fetch the modinfo for dependencies.

When all modules are included, it makes no difference.

But, when the user significantly reduces the set of included modules, the cost of multiple calls to modinfo while chasing the dependency chains is much lower than the cost of querying and parsing all the kernel modules on the system, for data that is never used.

For example, KernelModules=default pulls in around 570 modules. With these changes the wall time for this build step is much reduced:

  • 9sec -> 5sec on ramdisk.
  • ~~85~~ 25sec -> 5sec (!) on a slow HDD.

LaurenceKiln avatar Dec 02 '25 13:12 LaurenceKiln

I've been working on properly testing this, so I haven't addressed the review notes yet.

(Everything below is on fedora)

  • I think testing found a bug in the old version. When modules include a non-existent module, the old version silently ignores it (it only checks the deps for missing modules)
  • Wrote a POC test to thoroughly compare input/output pairs for old and new implementations. You can run it directly via "./test_new_dep_res.py". It also shows runtime for old/new versions.
  • I've verified that:
  1. The function outputs match in all cases of interest I could identify (See test). Including testing hundreads of randomly selected groups of modules (including some with firmware)
  2. That it outputs the same warnings about dependencies being missing (e.g. cifs2 softdep on aead2.
  3. That missing modules (not deps) trigger a warning (for now, see Questions)

Questions:

  • Original version treated soft-dep and deps equally, which I don't like. It converted missing modules into a warning. Shouldn't a missing hard-dep stop the build?
  • What's the correct behavior for KernelModules=no_such_kmod? warn or fail?

Notes:

  • On fedora, modinfo crc32 returns "crc32-cryptoapi" which is aliased to crc32, while modinfo -m crc32 only shows the built-in crc32. The old modinfo didn't use -m but it doesn't matter because the code looked at builtin's first. To keep this behavior with the new impl, I switched to modinfo -m.
  • Much of runtime now seems to be spend in bringing up and tearing down the sandbox for modinfo, which comes in at about 2 seconds on my archaic laptop HDD. Probably less noticeable SSD, but perhaps there's room to optimize sandbox. In any case, modinfo seems pretty innocuous to justify spending a majority of its time building a sandbox.

LaurenceKiln avatar Dec 03 '25 13:12 LaurenceKiln

So we have to be slightly careful here because setting up the sandbox with python isn't exactly free so if we invoke modinfo too many times that'll start slowing us down. But since you did the measurements I think we should be fine.

Right. From random testing, the dependency chain can get up to around 5.

At least here that's still much faster than querying all modules on the system. Plus, for the default case with all modules (with a large enough chunk size) there's still just one modinfo call.

The obvious solution is to lift the sandbox to encompass all the modinfo calls, since sandbox creation is now the bottleneck.

Do run the test/bench and see if the tradeoff is different on your machine. My laptop is old and I can't rely on it to predict perf for common user.

LaurenceKiln avatar Dec 03 '25 17:12 LaurenceKiln

I'm not gonna bother reviewing the version with the changes to make your testing script work. I'll trust your measurements and verify myself with a quick image build after your stuff is merged. So ping me when you're happy with your measurements and I'll be happy to review

DaanDeMeyer avatar Dec 03 '25 19:12 DaanDeMeyer

Ok. I wouldn't like to merge this until I've figured out a way to share a chroot between multiple calls.

LaurenceKiln avatar Dec 03 '25 20:12 LaurenceKiln

Ok. I wouldn't like to merge this until I've figured out a way to share a chroot between multiple calls.

That's going to be non-trivial change. I'm not opposed if the numbers show a big win, but let's please get this working without the chroot reuse and then add the chroot reuse in a second PR.

DaanDeMeyer avatar Dec 03 '25 20:12 DaanDeMeyer

  1. Found out the sandbox calls are much slower on ext4. When running with btrfs, the sandbox overhead actually isn't that bad.

  2. For some reason, the mocking test runs much faster than a real build using the same module set, doing the same work. While the test (now with btrfs) shows a 10sec->0.5sec improvement, actually running two buildds with old and new code gives a drop of around 80sec (!) in runtime.

I don't understand why this is, and I'm not sure if it's a bug, an optimization opportunity, or expected behavior.

LaurenceKiln avatar Dec 04 '25 00:12 LaurenceKiln

  • Improve performance when KernelModules= restricts the set of included modules
  • Warn when a requested module is missing
  • Report final module/firmware count and list

Local benchmark on laptop with HDD with btrfs (cached build):

% time mkosi -f --kernel-modules=default build 

workspace         Change[sec]
tmpfs (ram)         -8 
HDD                -75

LaurenceKiln avatar Dec 04 '25 12:12 LaurenceKiln

@behrmann , @DaanDeMeyer, If there are no further review notes, I consider this ready to merge. Removed the testing commit.

LaurenceKiln avatar Dec 04 '25 12:12 LaurenceKiln

  • Warn early about missing requested modules.
  • No need to explicitly keep track of skipped.
  • Removed chunking logic. cmdline limit allows well over 100k+ typical-length names.
  • Simplified loop.

LaurenceKiln avatar Dec 05 '25 19:12 LaurenceKiln

  • Distinguish soft and hard dependencies for warnings
  • parse_modinfo_output returns a TypedDict

LaurenceKiln avatar Dec 05 '25 21:12 LaurenceKiln

  • warn about missing firmware files. This can be quite noisy sinceno firmware packages are installed unless the user explicitly included them inPackages=`, but I think we do want to be loud about missing firmware. Previously, there was no indication that firmware might be missing.

LaurenceKiln avatar Dec 06 '25 13:12 LaurenceKiln