ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Feature/detect frontera

Open benjamin-cash opened this issue 1 year ago • 31 comments

Commit Queue Requirements:

  • [x] Fill out all sections of this template.
  • [ ] All sub component pull requests have been reviewed by their code managers.
  • [ ] Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • [ ] Commit 'test_changes.list' from previous step

Description:

Commit Message:

* UFSWM - This only affect detect_machine.sh
  

Priority:

  • Normal

Git Tracking

UFSWM:

  • Closes # https://github.com/NOAA-EMC/global-workflow/issues/2570 (Note that I did not realize ufs-weather-model was the authoritative copy of this script when I created the issue)

Sub component Pull Requests:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

Testing Log:

  • RDHPCS
    • [ ] Hera
    • [ ] Orion
    • [ ] Hercules
    • [ ] Jet
    • [ ] Gaea
    • [ ] Derecho
  • WCOSS2
    • [ ] Dogwood/Cactus
    • [ ] Acorn
  • [ ] CI
  • [ ] opnReqTest (complete task if unnecessary)

benjamin-cash avatar May 03 '24 17:05 benjamin-cash

Letting @NOAA-EMC/teams/global-workflow-admins know so they can get a PR to bring in these changes as needed.

Thought that team links worked, but alas they don't. @aerorahul letting you know detect_machines.sh is getting a change.

BrianCurtis-NOAA avatar May 03 '24 18:05 BrianCurtis-NOAA

@BrianCurtis-NOAA - I have another set of small changes in this same line. One is a minor update to modules-setup.sh, and the other is to add ufs_frontera.intel.lua to modulefiles. Would it make sense to just add those into this PR, or should I let this one close out and then open a new one?

benjamin-cash avatar May 03 '24 20:05 benjamin-cash

You can keep making changes here, just let me know when you're done.

BrianCurtis-NOAA avatar May 03 '24 20:05 BrianCurtis-NOAA

@benjamin-cash Are you also plaining to activate rt.sh on Frontera? I lost track of what I add to ufs-coastal but you could check it from there if you want. https://github.com/oceanmodeling/ufs-coastal. BTW, let me know if you need help. It would be nice to have frontera support in ufs-weather-model level. Thanks for doing it.

uturuncoglu avatar May 03 '24 20:05 uturuncoglu

@uturuncoglu - If you have rt.sh working for Frontera I would love to get that into ufs-weather-model as well. I think that can be separated from this PR though, so I won't add anything more to this one. (@BrianCurtis-NOAA)

benjamin-cash avatar May 03 '24 20:05 benjamin-cash

@benjamin-cash Yes, rt.sh is working on UFS coastal and we are running ufs-coastal specific RTs with it. I sync ufs-coastal couple of days ago with ufs-weather-model. So, if you look at the diff from here you might see those changes around rt.sh. https://github.com/ufs-community/ufs-weather-model/compare/develop...oceanmodeling:ufs-coastal:feature/coastal_app This also has changes related to the ufs-coastal like extra components etc.

uturuncoglu avatar May 03 '24 20:05 uturuncoglu

@uturuncoglu - I tried running cpld_control_c192_p8 via the coastal rt.sh, and ran into errors. It looks like in default_var.sh the only variable added for frontera was TPN=56, and none of the other variables like INPES_dflt. Did I miss a step, or would those still need to be updated to run the rest of the tests?

benjamin-cash avatar May 03 '24 22:05 benjamin-cash

@benjamin-cash yes. that needs to be extended. I have no experience about those numbers. maybe @BrianCurtis-NOAA could help about it

uturuncoglu avatar May 03 '24 22:05 uturuncoglu

@benjamin-cash I just set a that number and it is working with coastal app but probably other RTs uses more platform specific parameters. If we could add others that would be great.

uturuncoglu avatar May 03 '24 22:05 uturuncoglu

@uturuncoglu - Makes sense. I'm going to try just copying in the settings for Derecho and see how far it makes it.

benjamin-cash avatar May 03 '24 22:05 benjamin-cash

@benjamin-cash Okay. If we could an other platform as much as close to Frontera, it would be a good starting point.

uturuncoglu avatar May 03 '24 22:05 uturuncoglu

@uturuncoglu - That was enough to at least get the test started, but then it failed because it was looking for the wrong WW3_input_data_* directory, I'll have to track down where that is set.

benjamin-cash avatar May 04 '24 00:05 benjamin-cash

@benjamin-cash probably it is pointing my input directory. Is there any place on Frontera that we could stage at least part of the UFS input data? Then maybe we could just put input files of coupled control p8 and point that one as disknm variable in rt.sh frontera part.

uturuncoglu avatar May 05 '24 03:05 uturuncoglu

There are a couple of options for data. One is that we could store the data on Ranch (Frontera archive system), and then stage the data to $SCRATCH on Frontera and recopy as needed. Or someone who is working on UFS on the system but not storing a lot of data otherwise could keep the files in their $WORK space. Do you know what the data volume is?

benjamin-cash avatar May 05 '24 15:05 benjamin-cash

@benjamin-cash The input folder on Derecho is around 275G /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/input-data-20240501. I think this includes all the data. But, if it is too much maybe we could selectively copy just couple of folders to run major tests.

uturuncoglu avatar May 05 '24 18:05 uturuncoglu

@uturuncoglu - At 275GB we can definitely find somewhere for that to sit. I don't think I have access to derecho at this point, could you globus that directory to $SCRATCH on Frontera and let me know where you put it? I can figure out a more permanent location for it from there.

benjamin-cash avatar May 05 '24 21:05 benjamin-cash

@benjamin-cash Sure. Let me copy it over. I'll let you know when it is finished.

uturuncoglu avatar May 06 '24 03:05 uturuncoglu

@benjamin-cash I copied files to /scratch1/01118/tg803972/RT/NEMSfv3gfs. Let me know if you have any issue to access it. I think you need to create develop-20240430 folder under this directory to run any RT. Then, maybe we could place baseline of couple of RT under develop-20240430. I am not sure running full test suite under Frontera is feasible or not at this point.

uturuncoglu avatar May 06 '24 04:05 uturuncoglu

@uturuncoglu - This discussion has wandered pretty far afield from the PR, so I'm going to move the discussion of the rt files to email. :)

benjamin-cash avatar May 06 '24 14:05 benjamin-cash

@BrianCurtis-NOAA - It looks like this is stuck waiting on reviews to come in (assuming updating didn't break anything just now), any chance you could help nudge this along?

benjamin-cash avatar Jul 08 '24 15:07 benjamin-cash

@benjamin-cash and @uturuncoglu Just confirming this has been tested on Frontera and works as expected with the tests you are able to run?

If so, @jkbk2004 can make sure to get this combined with another PR as we don't need to worry about baselines (as far as I understand).

BrianCurtis-NOAA avatar Jul 08 '24 15:07 BrianCurtis-NOAA

Hi @BrianCurtis-NOAA - The changes to detect_machine.sh is something I've used multiple times when I've downloaded the weather model so they should be good to go. The module changes have been somewhat overtaken by events - we now have spack-stack working via container on Frontera.

benjamin-cash avatar Jul 08 '24 16:07 benjamin-cash

@benjamin-cash @BrianCurtis-NOAA I am testing in my end too. I'll update you soon.

uturuncoglu avatar Jul 08 '24 16:07 uturuncoglu

@benjamin-cash I think there is an issue with ufs_frontera.intel.lua file. There are some html tag in it. So, probably it is corrupted.

uturuncoglu avatar Jul 08 '24 16:07 uturuncoglu

Hi @uturuncoglu - Yikes, yeah, no idea how I managed to do that. Could you point me to the module file you have tested on Frontera and I will replace?

benjamin-cash avatar Jul 08 '24 16:07 benjamin-cash

@benjamin-cash We are using following with UFS Coastal - https://github.com/oceanmodeling/ufs-weather-model/blob/feature/coastal_app/modulefiles/ufs_frontera.intel.lua but I think you need to change the paths for your installation.

uturuncoglu avatar Jul 08 '24 16:07 uturuncoglu

I did not try to fix yours yet but if you want I could try.

uturuncoglu avatar Jul 08 '24 16:07 uturuncoglu

@benjamin-cash BTW, it seems that you don't have any change in rt.sh side. Are you plaining to do it? UFS Coastal could still maintain its own changes related with the rt.sh.

uturuncoglu avatar Jul 08 '24 16:07 uturuncoglu

When this pr is ready, we can combine with #2335 and #2278.

jkbk2004 avatar Jul 08 '24 16:07 jkbk2004

Hi @uturuncoglu - for this PR the module file was meant to be an exact copy of yours and to use the non-container version of spack-stack on Frontera. I hadn't made any changes to rt.sh in this PR, but maybe it would make sense to fold them in as well.

benjamin-cash avatar Jul 08 '24 16:07 benjamin-cash