mpich icon indicating copy to clipboard operation
mpich copied to clipboard

env: use mydef to template mpicc scripts

Open hzhou opened this issue 2 years ago • 3 comments

Pull Request Description

We maintain following scripts:

src/env/mpicc.sh.in
src/env/mpicc.bash.in
src/env/mpicxx.sh.in
src/env/mpicxx.bash.in
src/env/mpifort.sh.in
src/env/mpifort.bash.in

These files contain much redundancy but with subtle differences. They are tedious to maintain and easy to introduce inconsistencies.

The PR introduce MyDef, a general purpose macro/template system, to allow write structured templates. As result, we use a single mpicc.def to replace the old 6 files. Other than whitespace diffs, the template should retain original mechanism and not introduce functional changes.

[skip warnings]

Reference

  • https://github.com/hzhou/mydef_boot
  • https://github.com/hzhou/mydef

Author Checklist

  • [x] Provide Description Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • [x] Commits Follow Good Practice Commits are self-contained and do not do two things at once. Commit message is of the form: module: short description Commit message explains what's in the commit.
  • [ ] Passes All Tests Whitespace checker. Warnings test. Additional tests via comments.
  • [x] Contribution Agreement For non-Argonne authors, check contribution agreement. If necessary, request an explicit comment from your companies PR approval manager.

hzhou avatar Apr 01 '23 23:04 hzhou

test:mpich/warnings/auto

hzhou avatar Apr 02 '23 02:04 hzhou

test:mpich/submodules

hzhou avatar Apr 02 '23 23:04 hzhou

I've thought about this some and have a few questions.

1. Can the .bash versions of these scripts be removed? Do they offer anything that the .sh version cannot do? I know we've discussed this in the past, but I do not recall the details.

The only reason we have two versions is because bash supports array and sh does not. Without array, sh versions have to quote the args and the quoting is hackish and known to be not 100% correct.

Also note, it is not just between sh and bash, there is high degree of redundancy between mpicc, mpicxx, and mpifort, whose variations are less trivial. In fact, I believe we have introduced inconsistency between these scripts. More often we fixed or enhanced something in mpicc, then neglected mpicxx and mpifort. The inconsistency is what I want to address.

2. Why not just generate `mpicc` and friends from a Python script directly? For instance using triple-quoted strings with substitutions. I am probably missing some complexity that MyDef solves.

If it is just strings with substitutions, then shell variables probably will do. In fact, we have used shell variables as much as we can. There are more subtleties that a python script will introduce more logics than the subtleties themselves and we'll end up maintaining both python scripts and shell scripts. We already do this with bindings scripts. With binding scripts, the saving from duplicate reductions (I think) is still worth it. But here the benefit/harm is not so clear.

Also, MyDef's purpose is not just templating, although templating is one of the use cases. MyDef is mainly to allow better refactoring. For example, the difference between sh and bash is really just a single difference (array feature), but the code differences are scattered in a few places. MyDef allows us to bring the logic into a single place so it is easier for maintainers to read and understand and also to prevent introducing accidental inconsistency.

3. Is the plan to expand MyDef usage to other areas MPICH if this is accepted?

In my personal opinion, MyDef provides superior solutions to most of the scripting/macro/code organization issues that we are all been suffering, so it is a no-brainer (for me). But it is not about me. I want to avoid sole maintainer situation as much as you do. My intention is really to introduce MyDef to a point where fellow developers either appreciate the solution as much as I do or send me enough feedback to show me my blindsight that MyDef is not actually good. Even for this PR, my main intention is to demonstrate pros/cons and have something to discuss. If in the end, you are not convinced of the benefit, then I have no intention of merging it.

So far, my main feedback is unfamiliarity, which is fair feedback, so I am working on that ;)

4. Do downstream maintainers modify these scripts in their distributions? If so, will this change interfere with those modifications? I suppose they can workaround by just modifying the already generated files and adjusting their build scripts.

Exactly. One of the good aspects of MyDef is it really emphasizes multi-layer aspects of development and emphasizes that the generated code should be as readable and as true as what a coder would write without MyDef. Think of MyDef as just a pre-autogen step or an editor tool that only help but does not intrude. The only issue is that a team should ensure a version of "true" source or bear the burden of synchronization/backporting.

My overall take is that these scripts are rarely updated, and templating them in some way is a good idea. That said, MyDef is unfamiliar, meaning the templates will likely have a sole maintainer in practice. This is essentially the state of all the code generation scripts in MPICH, so it's something we are currently able to tolerate.

Unfamiliar is a fair feedback. The question is how easy to grok starting from the unfamiliar. If you don't know Python, it may take a week to learn. If you don't know bash, it may take a day to learn. If you don't know emacs or vim, it may take a few hours to learn. Of course, they all take months or a lifetime to get better. I really aimed for people to be able to start work with MyDef merely after working through simple examples with a page-long tutorial. In fact, I naively believe that one can work with MyDef by just looking at the code and guessing -- but I probably underestimated the role of experience.

The sole maintainer situation is bad and it is always my main intention to avoid it. But I think you have been over-estimating it in practice. For example, the Fortran binding and mpi_f08 are very much seem impenetrable but you have been maintaining it here and there. The python bindings are quite complex (well too bad, it is complicated) but I see you have been maintaining it here and there. And even external contributors have contributed patches. I understand a lot of these patches bring along the pain of learning unfamiliar code, but that has always been the life of maintainers. Initially, we are all being forced and reluctantly working on unfamiliar parts of code, but as we keep maintaining the code, they will become familiar to some degree. So I don't think the fear of new technology/solution is a legit factor, rather, the factors are 1. how easy to grok by unfamiliar developers. 2. how the solutions (new and old) compare once we are familiar with both.

hzhou avatar Apr 29 '23 16:04 hzhou