MFC Operator splitting, adaptive time stepping, and other fixes for mixing layer and bubbles

Description

This PR introduces several new features including

Strang operator splitting with adaptive time stepping strategy (adap_dt = T),
Solving the number density equation directly and computing void fraction from the number density (for adv_n = T),
Fixing the instability wave code to apply slip-wall BC correctly,
Support for artificial Mach number (or artificial pi_inf) - use artificial pi_inf for background flow and true pi_inf for bubbles. The ratio between artificial and true pi_inf's, pi_fac, is added as an input parameter.

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue) - option 3
[x] New feature (non-breaking change which adds functionality) - option 1, 2, 4

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

[x] Passed test suite with CPUs on Richardson and Delta
[x] Passed test suite with GPUs on Delta
[x] Passed all CI tests on github
[x] Tested on a single collapsing bubble case - added to examples (0d_bubblecollapse_adap)
[x] Tested on a 2D mixing layer case - added to examples (2d_mixing_artificial_Ma)

Test Configuration:

What computers and compilers did you use to test this:

Test suite with CPUs on Richardson: /opt/gcc/9.3.0/bin/, /act/openmpi-2.0/gcc-9/bin/, python-3.10.13 (installed on my local dir)
Test suite with CPUs on Delta: gcc/11.4.0, openmpi, python/3.11.6 (modules loaded)
Test suite with GPUs on Delta: nvhpc/22.11, openmpi+cude/4.1.5+cuda, cmake, python/3.11.6 (modules loaded)

Checklist:

[x] I have added comments for new code
[x] I added Doxygen docstrings to new code
[x] I have made corresponding changes to the documentation (docs/)
[x] I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
[x] I have added example cases in examples/ that demonstrate my new feature performing as expected
[x] New and existing tests pass locally with my changes, including with GPU capability enabled and disabled
[x] This PR does not introduce any repeated code (it follows the DRY principle)
[x] I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/`)

To make sure the code is performing as expected on GPU devices, I have:

[x] Checked that the code compiles using NVHPC compilers
[x] Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
[x] Enclosed the new feature via nvtx ranges so that they can be identified in profiles
[x] Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
[x] Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

Jan 01 '24 07:01 hyeoksu-lee

@lee-hyeoksu it looks like you regenerated all the goldenfiles. Why? It also looks like you updated the metadata files but not the goldenfiles in many cases. Please pull those metadata files back in if you aren't changing the goldenfile.

Jan 01 '24 16:01 sbryngelson

@sbryngelson With the current golden files, the CI tests failed like right now. I think it is due to my changes, although I am not sure which parts are responsible for that. That's why I regenerated the golden files in the previous commit, which was reverted now. Do you think the CI test should pass with the current golden files? Or if you think it's fine, I will update the golden files again.

Jan 04 '24 00:01 hyeoksu-lee

@lee-hyeoksu Do you expect a meaningful difference in output between current master and your branch for these test cases that fail? If so, you should regenerate the golden files for these cases specifically and validate the new results.

Jan 04 '24 00:01 henryleberre

@henryleberre I don't think there is an important update on output. Most changes are about input parameter and simulation process, not output file/data. I will check the code again and look into the test case to see what happens there.

Jan 04 '24 00:01 hyeoksu-lee

@lee-hyeoksu it is failing all of the sub-grid bubbles cases. I think we may need to have a discussion about having two problem length scales (a bubble one and another one). I am confident this is what is causing the issue, and even if you get them to match, it seems confusing.

I recommend bringing it up with Tim. Again, my preference is one length scale for the entire problem setup. Otherwise, this just feels like it's asking for confusion/problems.

Jan 04 '24 03:01 sbryngelson

@sbryngelson I tried to remove the second length and velocity scales (bubble scales). On this PR, I was able to remove the second velocity scale, but I needed at least one length scale parameter, which is len_ratio representing the ratio of two lenth scales. This is because the current MFC defines R0 that have values centered at 1 (for monodisperse, R0 = 1. But, here, R0 = 1 means R0 is normalized with the initial equilibrium radius, which introduces the use of length scale other than the background flow length scale. This is why I had to introduce len_ratio.

The use of len_ratio kind of parameter can be avoided by using the bubble scales (not background flow scales) for the entire simulation. In this case, however, I need to somehow rescale the output data to the background flow scales to analyze them.

I will discuss about this with Tim. If you have any suggestions/thoughts, it would be greatly appreciated!

Jan 04 '24 04:01 hyeoksu-lee

@sbryngelson I tried to remove the second length and velocity scales (bubble scales). On this PR, I was able to remove the second velocity scale, but I needed at least one length scale parameter, which is len_ratio representing the ratio of two lenth scales. This is because the current MFC defines R0 that have values centered at 1 (for monodisperse, R0 = 1. But, here, R0 = 1 means R0 is normalized with the initial equilibrium radius, which introduces the use of length scale other than the background flow length scale. This is why I had to introduce len_ratio.

The use of len_ratio kind of parameter can be avoided by using the bubble scales (not background flow scales) for the entire simulation. In this case, however, I need to somehow rescale the output data to the background flow scales to analyze them.

I will discuss about this with Tim. If you have any suggestions/thoughts, it would be greatly appreciated!

I'm not sure I understand. The length scale of the broader flow physics can be much larger than the bubble length scale without causing issues in the code or simulations. This just means that the relevant broader flow length scale is larger than unity (in code units), which is fine.

Jan 04 '24 04:01 sbryngelson

You can resolve the merge conflicts with the formatting discussion here: https://github.com/MFlowCode/MFC/pull/296

Jan 27 '24 02:01 sbryngelson

You can resolve the merge conflicts with the formatting discussion here: #296

I didn't know that the conflicts are related to the formatting so I resolved conflicts in a much more complicated way😅... I am not sure if it works or not, but if it doesn't work, I may need to revert those commits and try the way introduced in #296.

Jan 27 '24 06:01 hyeoksu-lee

I removed scaling parameters that I introduced previously, so the MFC has single normalization throughout the code.

For now, tests on Phoenix have not been done, but all the others passed. So I think this draft PR is ready for review!

Updated: Phoenix tests failed but I couldn't find detailed error messages. Benchmark tests do not show any error log and exit codes for cpu and gpu tests are 1 and 143, respectively. Could anyone help me figure out what's causing the failure?

Jan 27 '24 09:01 hyeoksu-lee

@lee-hyeoksu sorry for the delay. Please add regular MFC full documentation, code comments, and doxygen docstrings for everything you added. Thanks!

Feb 05 '24 03:02 sbryngelson

@lee-hyeoksu did you test your PR on GPUs with nvhpc/nvfortran? it appears to fail at compile time.

Feb 05 '24 22:02 sbryngelson

@sbryngelson Is there any procedure to test PR on GPUs with nvhpc/nvfortran other than just creating/updating PR? I haven't done simulations on GPUs or Phoenix, so I have no clue why CI tests on GPU/Pheonix failed, especially if it is due to compile time. Do you have any idea on what's causing the failure?

Feb 06 '24 22:02 hyeoksu-lee

@lee-hyeoksu you should have access to computers with GPU devices on them, like NCSA Delta or PSC Bridges2. From there we have standard procedures for building MFC. If your code does not run efficiently on GPU devices then we cannot merge it. The CI automatically checks if your code builds and runs on multiple platforms, including different GPUs.

I recommend talking to @JRChreim on how to do this, since he did it recently.

Feb 06 '24 22:02 sbryngelson

@sbryngelson I see. I can check it on Bridges2. Thanks!

Feb 06 '24 22:02 hyeoksu-lee

@sbryngelson I have updated the documentation, code comments, and doxygen docstrings. Please let me know if you want more details or if anything is written in wrong format.

Feb 09 '24 20:02 hyeoksu-lee

@lee-hyeoksu I updated your first PR comment at the top, can you go through the relevant steps?

Feb 21 '24 12:02 sbryngelson

@lee-hyeoksu I updated your first PR comment at the top, can you go through the relevant steps?

Sure! Sorry for the delay.

Feb 21 '24 20:02 hyeoksu-lee

@sbryngelson I have a question about a checklist item regarding --nsys since I have almost zero knowledge on Nsight Systems profile. I got informed a little bit from the MFC documentation though.

To my understanding, I can run any simulation with --nsys flag, and it will provide a summary of overall performance. But, the PR checklist above says "... have attached the output file (.nsys-rep) and plain text results to this PR". For this checklist, how many cases or what cases should I run with --nsys and include the results in this PR?

Feb 27 '24 07:02 hyeoksu-lee

@sbryngelson I have a question about a checklist item regarding --nsys since I have almost zero knowledge on Nsight Systems profile. I got informed a little bit from the MFC documentation though.

To my understanding, I can run any simulation with --nsys flag, and it will provide a summary of overall performance. But, the PR checklist above says "... have attached the output file (.nsys-rep) and plain text results to this PR". For this checklist, how many cases or what cases should I run with --nsys and include the results in this PR?

Hey @lee-hyeoksu -- good question, I should add more details to this template. You should only run cases that show the new feature. So, for example, a 2D case using the operator/timestep splitting would be one example and a case using the artificial Mach number. I think that covers all of your new features that could impact the simulation time/efficiency.

Yes if you run with --nsys you will get an nsight systems profile output (assuming you are running in GPU mode in a GPU node). You can append options that get passed to the nsight systems command line call if you like. They're documented here: https://docs.nvidia.com/nsight-systems/UserGuide/index.html

Feb 27 '24 13:02 sbryngelson

@sbryngelson I managed to generate .nsys-rep files, and checked them on Nsight Systems GUI. At this point, I have a few questions.

Regarding "Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR",

I have two files generated: report1.nsys-rep and report2.nsys-rep. Is it expected to have two files? If so, can I just upload all of them to the example/0d_bubblecollapse_adap, for example?
What does the "plain text results" refer to? Also, should I upload this one to the same location with .nsys-rep files?

Regarding "Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature",

do you expect me to compare the normalized wall time, like shown in the MFC documentation?

Mar 05 '24 23:03 hyeoksu-lee

@sbryngelson I managed to generate .nsys-rep files, and checked them on Nsight Systems GUI. At this point, I have a few questions.

Regarding "Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR",

I have two files generated: report1.nsys-rep and report2.nsys-rep. Is it expected to have two files? If so, can I just upload all of them to the example/0d_bubblecollapse_adap, for example?

What does the "plain text results" refer to? Also, should I upload this one to the same location with .nsys-rep files?

Regarding "Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature",

do you expect me to compare the normalized wall time, like shown in the MFC documentation?

It probably created different reports for pre_process and simulation (I would guess). The "plain text" part refers to the stdout that comes out of the command line during the profile.
You can just compare relative wall clock time difference for 2 vs. 1 GPU and 8 vs. 1 GPU. You should show this for both the master branch for a 3D base case that is basically the same as your current case with new features, as well as your cases with new features but otherwise the same problem size.

Mar 06 '24 00:03 sbryngelson

As I mentioned in Slack channel, attached are the NSYS reports and plain text output for three example cases. For all cases, the reports were generated with the options -N 1 -n 4 on a gpuA100x4 node on Delta. For each case, there are three report files generated, but the third one is larger than GitHub file size limit, so I just attached first two reports. Please let me know if you'd like to see the reports for other cases or different ranks.

0D_bubblecollapse_adap.zip 2D_mixing_arfitifical_Ma.zip 3D_weak_scaling.zip

Mar 11 '24 00:03 hyeoksu-lee

Also, the table shows the 3D_weak_scaling results for different numbers of nodes with all of the GPUs on each nodes. The final time and wall time do increase with the ranks, but there is no significant difference between current master branch (without change) and my branch.

Screenshot 2024-03-10 at 5 35 32 PM

Mar 11 '24 00:03 hyeoksu-lee

As I mentioned in Slack channel, attached are the NSYS reports and plain text output for three example cases. For all cases, the reports were generated with the options -N 1 -n 4 on a gpuA100x4 node on Delta. For each case, there are three report files generated, but the third one is larger than GitHub file size limit, so I just attached first two reports. Please let me know if you'd like to see the reports for other cases or different ranks.

0D_bubblecollapse_adap.zip 2D_mixing_arfitifical_Ma.zip 3D_weak_scaling.zip

Thanks but the third report is actually the only important one (the other two are for the syscheck and preprocess, which are irrelevant here).

Mar 11 '24 01:03 sbryngelson

As I mentioned in Slack channel, attached are the NSYS reports and plain text output for three example cases. For all cases, the reports were generated with the options -N 1 -n 4 on a gpuA100x4 node on Delta. For each case, there are three report files generated, but the third one is larger than GitHub file size limit, so I just attached first two reports. Please let me know if you'd like to see the reports for other cases or different ranks. 0D_bubblecollapse_adap.zip 2D_mixing_arfitifical_Ma.zip 3D_weak_scaling.zip

Thanks but the third report is actually the only important one (the other two are for the syscheck and preprocess, which are irrelevant here).

I see. But the report file sizes are too large to be uploaded on Github (25MB limit). Could you access to the box link below?

https://caltech.box.com/s/zb7b6h4ph1k8pfue0631r1fi7sb9s4ls

Mar 11 '24 02:03 hyeoksu-lee

You can send it to me via Slack or Email or some other mechanism. I cannot access that Box link.

Mar 11 '24 02:03 sbryngelson

@lee-hyeoksu, thanks for keeping this up to date. I will merge after @anandrdbz finishes the Frontier PR #368, a blocking change.

Mar 28 '24 23:03 sbryngelson

@sbryngelson I merged with the PR #368 and checked it works correctly on Richardson CPUs and Delta GPUs.

Apr 07 '24 02:04 hyeoksu-lee

Note: I found this https://github.com/MFlowCode/MFC/issues/390

Apr 08 '24 22:04 sbryngelson

MFC MFC copied to clipboard

Operator splitting, adaptive time stepping, and other fixes for mixing layer and bubbles

Description

Type of change

How Has This Been Tested?

Checklist:

If your code changes any code source files (anything in src/)

MFC
MFC copied to clipboard

If your code changes any code source files (anything in `src/`)