Fix non bfb issues with latent heat in p3 on PM-GPU
This PR...
- reverts 9438a980a0caebe562568beccb9b1c4b614a390f (changed P3 to use constants for latent_heat variables instead of allocating 2d views during runtime)
- reimplements the main goal of those changes (no view allocs during runtime) but leaves these variables as views (now in the workspace manager (monolithic) or mem buffer (small kernels)).
I don't know why the previous version was non-BFB, but investigating may take some time and PM-GPU is needed for PR testing, so I suggest merging this PR. I've added "TODO" statements to track that eventually these should just be constants and can create an issue once this is merged (if others agree). The only downside is we keep the 3 temp views.
Testing
I ran the following
./cime/scripts/create_test e3sm_scream_v1 e3sm_scream_v1_long --machine pm-gpu --compiler gnugpu -c -b master -t latent_heat_pr
./cime/scripts/create_test e3sm_scream_v1_medres --machine pm-cpu --compiler=gnu -c -b master -t latent_heat_pr
and passed all baselines for CPU and GPU.
I've narrowed down to the offending usage of latent_heat_fusion. I'm waiting to merge this in case the fix is simple and does not require updating baselines.
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:
Pull Request Auto Testing STARTING (click to expand)
Build Information
Test Name: SCREAM_PullRequest_Autotester_Weaver
- Build Num: 6060
- Status: STARTED
Jenkins Parameters
| Parameter Name | Value |
|---|---|
| PR_LABELS | p3;bugfix |
| PULLREQUESTNUM | 2998 |
| SCREAM_SOURCE_REPO | https://github.com/E3SM-Project/scream |
| SCREAM_SOURCE_SHA | fd86a15a71a670198f9410bd9fb62a7a67150255 |
| SCREAM_TARGET_BRANCH | master |
| SCREAM_TARGET_REPO | https://github.com/E3SM-Project/scream |
| SCREAM_TARGET_SHA | 25120ff1fd4fb086176d21ebf888d3722a915bb8 |
| TEST_REPO_ALIAS | SCREAM |
Using Repos:
- Repo: SCREAM (E3SM-Project/scream)
- Branch: tcclevenger/fix_non_bfb_issues_with_latent_heat_in_p3
- SHA: fd86a15a71a670198f9410bd9fb62a7a67150255
- Mode: TEST_REPO
Pull Request Author: tcclevenger
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED
Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.
Pull Request Auto Testing has FAILED (click to expand)
Build Information
Test Name: SCREAM_PullRequest_Autotester_Weaver
- Build Num: 6060
- Status: FAILED
Jenkins Parameters
| Parameter Name | Value |
|---|---|
| PR_LABELS | p3;bugfix |
| PULLREQUESTNUM | 2998 |
| SCREAM_SOURCE_REPO | https://github.com/E3SM-Project/scream |
| SCREAM_SOURCE_SHA | fd86a15a71a670198f9410bd9fb62a7a67150255 |
| SCREAM_TARGET_BRANCH | master |
| SCREAM_TARGET_REPO | https://github.com/E3SM-Project/scream |
| SCREAM_TARGET_SHA | 25120ff1fd4fb086176d21ebf888d3722a915bb8 |
| TEST_REPO_ALIAS | SCREAM |
SCREAM_PullRequest_Autotester_Weaver # 6060 FAILED (click to see last 100 lines of console output)
Warning: Permanently added the ECDSA host key for IP address '140.82.113.3' to the list of known hosts.
Submodule 'extern/Catch2' ([email protected]:E3SM-Project/Catch2) registered for path 'externals/ekat/extern/Catch2'
Submodule 'extern/kokkos' ([email protected]:E3SM-Project/kokkos) registered for path 'externals/ekat/extern/kokkos'
Submodule 'extern/spdlog' ([email protected]:gabime/spdlog.git) registered for path 'externals/ekat/extern/spdlog'
Submodule 'extern/yaml-cpp' ([email protected]:SNLComputation/yaml-cpp.git) registered for path 'externals/ekat/extern/yaml-cpp'
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/Catch2'...
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/kokkos'...
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/spdlog'...
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/yaml-cpp'...
Warning: Permanently added the ECDSA host key for IP address '140.82.114.4' to the list of known hosts.
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:SNLComputation/yaml-cpp.git' into submodule path '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/yaml-cpp' failed
Failed to clone 'extern/yaml-cpp'. Retry scheduled
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/yaml-cpp'...
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:SNLComputation/yaml-cpp.git' into submodule path '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6060/scream/externals/ekat/extern/yaml-cpp' failed
Failed to clone 'extern/yaml-cpp' a second time, aborting
Failed to recurse into submodule path 'externals/ekat'
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2846)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2185)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.lambda$execute$0(CliGitAPIImpl.java:1573)
at Jenkins v2.462.1//com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at Jenkins v2.462.1//com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at Jenkins v2.462.1//com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at Jenkins v2.462.1//com.google.common.util.concurrent.DirectExecutorService.execute(DirectExecutorService.java:51)
at java.base/java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:184)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.submitRemainingCommand(GitCommandsExecutor.java:77)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:70)
Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to weaver
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1826)
at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
at hudson.remoting.Channel.call(Channel.java:1042)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor105.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
at PluginClassLoader for git-client/jdk.proxy30/jdk.proxy30.$Proxy100.execute(Unknown Source)
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:196)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1388)
at hudson.scm.SCM.checkout(SCM.java:540)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1247)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
at hudson.model.Run.execute(Run.java:1894)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:446)
Caused: hudson.plugins.git.GitException
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.checkResult(GitCommandsExecutor.java:89)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:69)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:47)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.execute(CliGitAPIImpl.java:1576)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:170)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:377)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused: java.io.IOException: Could not perform submodule update
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:201)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1388)
at hudson.scm.SCM.checkout(SCM.java:540)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1247)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
at hudson.model.Run.execute(Run.java:1894)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:446)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash -le
cd $WORKSPACE/${BUILD_ID}/
./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins8739115007280394007.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE
@tcclevenger did you add WIP to this PR because the AT wasn't working? If so, I think we can unWIP it and add the RETEST label.
@tcclevenger did you add WIP to this PR because the AT wasn't working? If so, I think we can unWIP it and add the RETEST label.
No, I put this WIP since we don't want to merge it, I was just using it to track the issue. I could close it, but I didn't in case it turns out we actually need it. But I don't think that will be the case.
closing this since it keeps showing up as request for review for me
@tcclevenger is this still an issue, should we re-open this on the E3SM side?
No, this is outdated. Ok to close.