CESM icon indicating copy to clipboard operation
CESM copied to clipboard

Revise documentation on how to use a branch of a component within a CESM checkout

Open billsacks opened this issue 4 years ago • 20 comments

I periodically get questions about how to point to a branch of a component (say, CAM, CTSM or CISM) within a CESM checkout (most recently from @Ivanderkelen). We have some documentation in the CESM README file, but I'm not sure that what we have is actually the method that makes the most sense. In particular, there are at least two problems with telling people to point to their branch in Externals.cfg:

  1. Support for branches with manage_externals still leaves a lot to be desired (see https://github.com/ESMCI/manage_externals/issues/34)

  2. Additional problems are caused if you change an Externals.cfg file in the middle of the tree (rather than the top-level file). For example, if you want to point to a different version of FATES (an external of CTSM), you might be tempted to modify the file components/clm/Externals_CLM.cfg. However, if you do that, when you rerun manage_externals from the top level (from the root of the CESM clone), you will get an error because the components/clm external has a modified file. (@ekluzek has suggested allowing checkout_externals to proceed if there are modifications in these Externals files. This may be worth considering, although it might be hard to implement and the implications should be thought through carefully.)

So I'm inclined to recommend that people checkout their branch using regular git commands. For example, if you want to checkout your branch of CTSM in a CESM checkout, I would recommend getting CESM as normal and doing an initial run of manage_externals/checkout_externals. Then do:

cd components/clm
git remote add ...
git fetch ...
git checkout ...

However, this isn't completely straightforward if your branch is of a component that has its own sub-externals (e.g., you have a branch of CTSM, which has a FATES external). In this case, I would probably still recommend using the above procedure to get your branch, but then getting any sub-externals by running the following from within components/clm (NOT from the CESM top level):

./manage_externals/checkout_externals clm

I'd like to hear some thoughts from others. I don't feel in a rush to update this documentation, so I think it's best if we take some time to gather thoughts until we feel pretty happy with a recommendation for users.

@gold2718 @cacraigucar @nusbaume @ekluzek @Katetc @mnlevy1981 @alperaltuntas @jedwards4b @mvertens @rsdunlapiv @uturuncoglu

billsacks avatar Mar 18 '20 18:03 billsacks

I think there may not be a one-size-fits-all recommendation. I end up doing both approaches depending on my needs.

I personally start out with changing the Externals.cfg file and pointing to my branch in it. That is easy and suits my needs a good portion of the time.

At other times(listed below), I will remove (or rename) the component in question and by hand do a git clone ... git checkout...

The times that I retrieve the component by hand are:

  • I have made mods anywhere and checkout_externals doesn't work
  • I plan on making mods to my fork, so I don't have a detached head and can commit my changes easily

I suppose we could document the two approaches along with the times when a user might prefer one over the other.

cacraigucar avatar Mar 18 '20 19:03 cacraigucar

Hi all,

I really, really prefer to work with the Externals.cfg files. I find that they are an important way to document exactly which versions of which components you are using for your experiments. This is super important for papers and reproducibility. I think it would be very valuable to have manage_externals ignore the changes to an Externals.cfg file, so that all checkouts can be done using one method and one tool. That would be my vote.

Katetc avatar Mar 18 '20 19:03 Katetc

If we were to additionally modify manage_externals to have a flag to allow checkouts to not have detached heads, then I too could depend on it 100% of the time (assuming Kate's suggestion was also addressed).

cacraigucar avatar Mar 18 '20 19:03 cacraigucar

@cacraigucar if the top level external points to a branch in a component, by default it checks it out as a detached-head, but you can change it to a branch using git commands, and manage_externals will continue to work correctly after that. Now, it does seem like having an option for manage_externals to checkout the branch properly rather than as a detached head would be useful here.

ekluzek avatar Mar 18 '20 20:03 ekluzek

The way I recommend is that if the component in question is at the top level (in Externals.cfg) you edit it there and use manage_externals to manage the branch. If you have to point to something that's in a lower level -- right now you have to do it with git commands.

One advantage of using manage_externals for Externals.cfg is that the CIME case system documents manage_externals results along the way. So if the top level Externals.cfg file is correct it's going to add it to the documentation of the case or test. It'll also give good documentation when you run the status command. That's part of why I think it's good when you can to use manage_externals for at least the top level.

ekluzek avatar Mar 18 '20 20:03 ekluzek

@cacraigucar also has a good point about when you have uncommitted mods. Sometimes you have something small that you don't really want to commit or create a branch for. One answer is git's "stash" command that allows you to keep a few of these types of things around with a little less overhead. It still has some overhead though.

Personally I usually work around it either by using "git stash" or by saving a local copy of what I changed before running manage_externals and then copy it back in afterwards. Saving a copy works when it's a single file, which is my typical case for wanting to do that.

ekluzek avatar Mar 18 '20 20:03 ekluzek

@Katetc I'm not sure what you mean by this...

I think it would be very valuable to have manage_externals ignore the changes to an Externals.cfg file, so that all checkouts can be done using one method and one tool.

So are you saying it would use the latest version of Externals.cfg in git and ignore the modified version? That would be a problem while you are developing a new branch or tag or even a new git repository for that matter.

I guess I could see an option where manage_externals was told to abort if Externals.cfg has been modified. I think maybe you are thinking of the case where you've checked out a given tag, and really want the externals to match exactly what that tag was? If so that does make sense in those cases that I've cloned something that I just want to point to a given very specific tag, and don't want it to ever change from that (like for control simulations).

ekluzek avatar Mar 18 '20 20:03 ekluzek

@ekluzek , sorry I was a bit vague there. I meant that manage_externals should ignore changes to Externals files in subdirectories. Such as the Externals_CISM.cfg file. Currently, if that one is changed, then manage_externals will not proceed due to the modification. I think it should treat a subdirectory where only an Externals.cfg file is changed as being unchanged. In other words, treat Externals.cfg files in sub directories the same way it does the ones in the main directory.

Katetc avatar Mar 18 '20 20:03 Katetc

Thank you all for your responses so far. For now I just want to reply to @Katetc 's latest comment:

manage_externals should ignore changes to Externals files in subdirectories. Such as the Externals_CISM.cfg file. Currently, if that one is changed, then manage_externals will not proceed due to the modification. I think it should treat a subdirectory where only an Externals.cfg file is changed as being unchanged.

This is a lot easier said than done, for two reasons (and I acknowledge that I'm the one who brought this up based on earlier conversations with @ekluzek , so I'm not pointing a finger at you for asking for it here):

  1. Currently, manage_externals just checks the git status of each external; if there are modified files, it calls the tree dirty:

https://github.com/ESMCI/manage_externals/blob/c33a3bd2a856dec33febf7f3fda30a4b0b9af608/manic/repository_git.py#L572-L597

What you're asking for would require parsing the output and ignoring files that match a certain pattern. Note that there is no required naming convention of files like Externals_CLM.cfg - that's just a convention. So this gets even trickier to do robustly.

You said:

In other words, treat Externals.cfg files in sub directories the same way it does the ones in the main directory.

but it's not that easy. Externals.cfg in the main directory is ignored implicitly, because it isn't a member of any externals.

  1. Assuming we could take care of (1), then there is the issue of what happens if if manage_externals tries to update an external that has a modified Externals_FOO.xml file. In some cases, this would lead to problems in the update, which is what we were trying hard to avoid. For this reason, I feel that ignoring some unmodified files is ill-advised.

Sorry, I forgot about point (2) when suggesting this possibility in my original comment. I am now thinking we really shouldn't do this.

Now what we could possibly do is generally relax the checking in manage_externals, so that it still allows an update in the case of a dirty sandbox, as long as it isn't trying to update a component that is itself dirty. (So if you have modified components/cism/Externals_CISM.cfg, then manage_externals will run happily as long as the top-level Externals.cfg isn't trying to change components/cism.)

But I'm also having trouble reconciling these thoughts with your earlier comment:

I really, really prefer to work with the Externals.cfg files. I find that they are an important way to document exactly which versions of which components you are using for your experiments. This is super important for papers and reproducibility.

For this to work well, don't you need to have your changes committed to a branch, and have that branch pushed to somewhere on GitHub?

billsacks avatar Mar 18 '20 20:03 billsacks

Note, I'm setting up a time to discuss this tomorrow at 11:00am MDT. Ping me if you want to be added to the group.

ekluzek avatar Mar 18 '20 21:03 ekluzek

Ok, so I see some of the issues here. Thanks for explaining that. There is a lot to consider here and I'm having a hard time framing my thoughts in a coherent message this afternoon. I agree with everything you said Bill, but I find it hard to believe that the ONLY solution to this problem (ie, in my sandbox, I want to run with a different src_cism than the release) is that we need a new wrapper branch every time, or stop using manage_externals (and then your externals don't match the Externals_FOO.cfg file).

Though, I wrote another entire comment about how checking your code into a branch is generally a good idea, and maybe we should just suggest branches off people's forks to address this. But, that could be tough for plenty of other reasons. So, I deleted my comment.

We can talk more about this at the meeting tomorrow.

Katetc avatar Mar 18 '20 22:03 Katetc

There is a lot to consider here and I'm having a hard time framing my thoughts in a coherent message this afternoon

Me too, but @ekluzek I'd like to be included in the discussion tomorrow (hopefully I'll have some clearer thoughts by then)

In my mind, it would be great to include the features @Katetc and @cacraigucar are requesting:

  1. A way to signal that to manage_externals that some files can differ from what's currently in git without aborting the entire checkout
  2. A way to specify that you want to end up on the head of the branch instead of in a detached head state

So far I've had a couple of awesome idea that were decidedly less awesome by the time I got halfway through writing them up, but maybe others can expand on my half-baked ideas tomorrow :) (Basically, I'd like a .manage_externals_ignore file that signals to manage_externals that it's okay if a specific file has been modified, but I think it maybe only works in the very narrow scope where the current checkout / detached head matches where manage_externals wants it to be? That avoids the pitfall of pattern-matching @billsacks mentioned in https://github.com/ESCOMP/CESM/issues/139#issuecomment-600852231, while also matching his second criteria for when to allow "dirty" checkouts... but with the added safety feature that only certain files are allowed to be modified)

mnlevy1981 avatar Mar 18 '20 22:03 mnlevy1981

  1. A way to signal that to manage_externals that some files can differ from what's currently in git without aborting the entire checkout

I just remembered that I already opened this issue: https://github.com/ESMCI/manage_externals/issues/112 . If people feel that's a good idea, then it would take care of this point, at least in some use cases.

A way to specify that you want to end up on the head of the branch instead of in a detached head state

We (largely you, @mnlevy1981 , together with @gold2718 ) sketched out a design for this in https://github.com/ESMCI/manage_externals/issues/34. I don't feel this needs more discussion right now.

What both of these issues have lacked is someone with the time to implement them.

billsacks avatar Mar 18 '20 23:03 billsacks

Also, my experience with manage_externals is that it is a lot harder than you first imagine to come up with behavior that is robust and doesn't break some other use case. This is not an issue with manage_externals per se, but rather is due to the complexity of the problem it is trying to solve. What has helped before is to have very specific use cases, detailing exactly what you want to be able to do, and what you want the behavior of manage_externals to be. The discussion in this issue has so far been too vague for me to feel like I have a good handle on it.

So I'd like to request that, before we have this meeting, people who feel manage_externals should operate differently please write up specific and detailed use cases laying out what they want. I'd like to be able to read these over and think about them a bit before we have a discussion. This probably argues for pushing the meeting back to at least next week.

billsacks avatar Mar 18 '20 23:03 billsacks

I'll comment in https://github.com/ESMCI/manage_externals/issues/112 with my thought of introducing a file that specifies what files are allowed to have changed without triggering an abort.

Also, I haven't changed my mind from two years ago (though I'll admit to having forgotten my opinion for many of the intervening months), and still like the workflow specified in https://github.com/ESMCI/manage_externals/issues/34#issuecomment-355073105

mnlevy1981 avatar Mar 18 '20 23:03 mnlevy1981

Sorry for being late in the discussion. Although I have no objection to fully relying on manage_externals, my preference is to recommend that people check out their branch using regular git commands, mainly because many are already familiar and comfortable with git, and I, personally, sometimes find manage_externals an unnecessary level of indirection. Coincidentally, I am in the middle of putting together a “Development and Testing” guideline for MOM6, and working directly with git commands is exactly what I suggest in the document (a work in progress): https://github.com/ESCOMP/MOM_interface/wiki/Development-and-Testing

I'd like to request that, before we have this meeting, people who feel manage_externals should operate differently please write up specific and detailed use cases laying out what they want.

One enhancement I'd suggest for manage_externals is to have the ability of checking out an external recursively, i.e., together with its git submodules. @gold2718 has recently added the option of getting externals from git submodules, but we still need to list the submodules in an auxiliary Externals file. In the case of MOM6, for instance, we have an interface repository called MOM_interface, which encapsulates the core MOM6 repository. The core MOM6 repository has several submodules that need to be listed in an Externals file in MOM_interface: https://github.com/ESCOMP/MOM_interface/blob/master/Externals_MOM.cfg If we could instruct manage_externals to check out MOM6 recursively, then we would not need to maintain a seconday externals file Externals_MOM.cfg for MOM6 submodules, which would simplify the workflow, especially for those who prefer to work with git directly.

alperaltuntas avatar Mar 18 '20 23:03 alperaltuntas

Alper,

We use manage externals with the ufs project where there is a hierarchy of git submodules and it works there - have you tried it with Mom lately? If I remember right part of the issue with MOM6 is that we didn't want all the submodules. But I may be misremembering that.

On Wed, Mar 18, 2020 at 5:43 PM Alper Altuntas [email protected] wrote:

Sorry for being late in the discussion. Although I have no objection to fully relying on manage_externals, my preference is to recommend that people check out their branch using regular git commands, mainly because many are already familiar and comfortable with git, and I, personally, sometimes find manage_externals an unnecessary level of indirection. Coincidentally, I am in the middle of putting together a “Development and Testing” guideline for MOM6, and working directly with git commands is exactly what I suggest in the document (a work in progress): https://github.com/ESCOMP/MOM_interface/wiki/Development-and-Testing

I'd like to request that, before we have this meeting, people who feel manage_externals should operate differently please write up specific and detailed use cases laying out what they want.

One enhancement I'd suggest for manage_externals is to have the ability of checking out an external recursively, i.e., together with its git submodules. @gold2718 https://github.com/gold2718 has recently added the option of getting externals from git submodules, but we still need to list the submodules in an auxiliary Externals file. In the case of MOM6, for instance, we have an interface repository called MOM_interface, which encapsulates the core MOM6 repository. The core MOM6 repository has several submodules that need to be listed in an Externals file in MOM_interface: https://github.com/ESCOMP/MOM_interface/blob/master/Externals_MOM.cfg If we could instruct manage_externals to check out MOM6 recursively, then we would not need to maintain a seconday externals file Externals_MOM.cfg for MOM6 submodules, which would simplify the workflow, especially for those who prefer to work with git directly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CESM/issues/139#issuecomment-600911721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGB27X3HGVUL5S65Y4LRIFMBTANCNFSM4LOX6W5A .

-- Jim Edwards

CESM Software Engineer National Center for Atmospheric Research Boulder, CO

jedwards4b avatar Mar 18 '20 23:03 jedwards4b

So I'd like to request that, before we have this meeting, people who feel manage_externals should operate differently please write up specific and detailed use cases laying out what they want. I'd like to be able to read these over and think about them a bit before we have a discussion. This probably argues for pushing the meeting back to at least next week

I would be happy to do this. But, yes, I don't think I can get it done today.

Katetc avatar Mar 19 '20 16:03 Katetc

We started the discussion on this. Half of the half-dozen people on the call would do git commands for branches, and the other half would make branches for the purpose of having manage_externals take care of everything.

The other use-case that makes a difference is if it's the production environment for running simulations versus development environment. We all agreed that in the production environment you want manage_externals to do everything for you, and you want it to make sure everything is correct and not allow something incorrect to exist unchallenged. So the current behavior is probably the best.

But, that also sets up a framework where our development environment is inconsistent with the production environment. Note, we have a similar but worse problem with people who develop using SourceMods -- there are always issues when we move them into the git world. This difference also means that sometimes you forget to update and commit the Externals files for instance. So I think the primary thing we want to do is to add some additional features as options to manage_externals to make it more useful in the development framework, and so it would allow developers to use it exclusively for external management.

ekluzek avatar Mar 19 '20 21:03 ekluzek

Note, also that @jedwards4b pointed out a feature in manage_externals to us that most of us hadn't realized. A positional argument to manage_externals is to checkout the list of externals that you give.

So for example if you just want to update cism you would do...

[eureka:~/Sandboxes/ctsm_relfatesndepmiscupdate] erik% ./manage_externals/checkout_externals cism ./manage_externals/checkout_externals cism Processing externals description file : Externals.cfg Processing externals description file : Externals_CISM.cfg Checking status of externals: cism, source_cism, Checking out externals: cism, Processing externals description file : Externals_CISM.cfg Checking out externals: source_cism,

You could also just update a single externals file using the "-e" (--externals option). Like this... (Make sure you are in the directory where that externals file exists)

[eureka:~/Sandboxes/ctsm_relfatesndepmiscupdate] erik% ./manage_externals/checkout_externals -e Externals_CLM.cfg ./manage_externals/checkout_externals -e Externals_CLM.cfg Processing externals description file : Externals_CLM.cfg Checking status of externals: fates, ptclm, Checking out externals: fates, ptclm,

ekluzek avatar Mar 19 '20 21:03 ekluzek