raven icon indicating copy to clipboard operation
raven copied to clipboard

run info block for running on cluster

Open mandd opened this issue 6 years ago • 2 comments


Issue Description

Provided this runinfo block: <RunInfo> <WorkingDir>/home/mandd/projects/ATF2018/cr</WorkingDir> <Sequence>Run</Sequence>

<batchSize>512</batchSize>
<maxQueueSize>512</maxQueueSize>
<expectedTime>24:00:00</expectedTime>

<RemoteRunCommand>raven_qsub_command_legacy.sh</RemoteRunCommand>
<mode>mpi<runQSUB/>
  <memory>12gb</memory>
</mode>
<clusterParameters>-P lwrs</clusterParameters>
<deleteOutExtension>.plt,.o,plotfl,restrt</deleteOutExtension>
What did you expect to see happen?

it should run

What did you see instead?

the code returns this error: ( 0.06 sec) SIMULATION : DEBUG -> Moving to working directory: /home/mandd/projects/ATF2018/Zr ( 0.06 sec) MPI SIMULATION MODE : Message -> precommand: mpiexec -n 1 , postcommand: ( 0.06 sec) Job Handler : DEBUG -> Setting maxQueueSize to 512 ( 0.06 sec) SIMULATION : DEBUG -> entering the run Traceback (most recent call last): File "/home/mandd/projects/raven/framework/Driver.py", line 281, in raven() File "/home/mandd/projects/raven/framework/Driver.py", line 234, in raven simulation.run() File "/home/mandd/projects/raven/framework/Simulation.py", line 740, in run remoteRunCommand = self.__modeHandler.remoteRunCommand(dict(self.runInfoDict)) File "/home/mandd/projects/raven/framework/CustomModes/MPISimulationMode.py", line 183, in remoteRunCommand return self.__createAndRunQSUB(runInfoDict) File "/home/mandd/projects/raven/framework/CustomModes/MPISimulationMode.py", line 166, in __createAndRunQSUB remoteRunCommand["cwd"] = runInfoDict['InputDir'] KeyError: u'InputDir'

Do you have a suggested fix for the development team?

Issue related to the absolute path The error disappear if the block <WorkingDir>/home/mandd/projects/ATF2018/cr</WorkingDir> is changed into: <WorkingDir>./</WorkingDir>

Please attach the input file(s) that generate this error. The simpler the input, the faster we can find the issue.

For Change Control Board: Issue Review

This review should occur before any development is performed as a response to this issue.

  • [x] 1. Is it tagged with a type: defect or improvement?
  • [x] 2. Is it tagged with a priority: critical, normal or minor?
  • [x] 3. If it will impact requirements or requirements tests, is it tagged with requirements?
  • [x] 4. If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
  • [ ] 5. Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)

For Change Control Board: Issue Closure

This review should occur when the issue is imminently going to be closed.

  • [ ] 1. If the issue is a defect, is the defect fixed?
  • [ ] 2. If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
  • [ ] 3. If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
  • [ ] 4. If the issue is a defect, does it impact the latest stable branch? If yes, is there any issue tagged with stable (create if needed)?
  • [ ] 5. If the issue is being closed without a merge request, has an explanation of why it is being closed been provided?

mandd avatar Sep 16 '18 23:09 mandd

@mandd This issue is not clear to me. Could you add more details?

wangcj05 avatar Jul 07 '22 22:07 wangcj05

I guess issue #938 can be combined with this issue. The following is from issue #938

When using an absolute path in the <WorkingDir> node together with QSUB, the code crashes with a key error indicating "InputDir" is not found.

This is the <RunInfo> creating the issue:

  <RunInfo>
    <WorkingDir>/home/xxx/test</WorkingDir>
    <Sequence> RunXXX </Sequence>
    <batchSize>2</batchSize> <!-- outer -->
    <NumMPI>20</NumMPI> <!-- inner -->
    <internalParallel>False</internalParallel>
    <mode>
      mpi
      <runQSUB/>
      <memory>6gb</memory>
    </mode>
    <clusterParameters>-P xxx</clusterParameters>
    <expectedTime>12:00:00</expectedTime>
  </RunInfo>

wangcj05 avatar Jul 08 '22 00:07 wangcj05