ANTsPy icon indicating copy to clipboard operation
ANTsPy copied to clipboard

Cannot run ants.registration in HPC (slurm)

Open mikami520 opened this issue 2 years ago • 14 comments

Describe the bug When I submit running command of the bash file to the HPC, the process got stuck at ants.registration step. It seems that the forward transformation and other output of registration cannot be saved. I created a tmp folder (default for ANTsPy) to save these outputs, but it still did not work. Does anyone have ideas about this issue? Thanks in advance.

HPC Python Virtual Environment:

antspyx==0.3.3
certifi==2022.6.15
charset-normalizer==2.0.12
chart-studio==1.1.0
configparser==5.2.0
cycler==0.11.0
fonttools==4.33.3
idna==3.3
imageio==2.19.3
joblib==1.1.0
kiwisolver==1.4.3
matplotlib==3.5.2
networkx==2.6.3
nibabel==3.2.2
numpy==1.21.6
packaging==21.3
pandas==1.3.5
patsy==0.5.2
Pillow==9.1.1
plotly==5.8.2
PyMySQL==1.0.2
pynrrd==0.4.3
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.1
PyWavelets==1.3.0
PyYAML==6.0
requests==2.28.0
retrying==1.3.3
scikit-image==0.19.3
scikit-learn==1.0.2
scipy==1.7.3
shutils==0.1.0
six==1.16.0
slicerio==0.1.3
statsmodels==0.13.2
tenacity==8.0.1
threadpoolctl==3.1.0
tifffile==2021.11.2
typing_extensions==4.2.0
urllib3==1.26.9
webcolors==1.12

mikami520 avatar Sep 29 '22 20:09 mikami520

Are you able to make a temp directory and write to it during an HPC job (completely outside an ants.registration call)?

ntustison avatar Sep 29 '22 20:09 ntustison

Are you able to make a temp directory and write to it during an HPC job (completely outside an ants.registration call)?

Did you mean I create and write to a temp directory in the script during an HPC job?

mikami520 avatar Sep 29 '22 20:09 mikami520

Yes, just trying to rule out potential issues outside of anything antspy-specific.

ntustison avatar Sep 29 '22 20:09 ntustison

Yes, just trying to rule out potential issues outside of anything antspy-specific.

Yes, it can work. I used a numpy array and saved it to the temp directory

import numpy as np
import os

def main():
    temp_dir = '/projects/autoseg-headneck-dl/tmp'
    try:
        os.mkdir(temp_dir)
    except:
        print(f'{temp_dir} is existed')
    
    a = np.arange(10)
    np.save(os.path.join(temp_dir, 'test.npy'), a)

if __name__ == '__main__':
    main()
Screen Shot 2022-09-29 at 16 53 55

mikami520 avatar Sep 29 '22 20:09 mikami520

Okay, great. Can you post the command line call and the resulting error message?

ntustison avatar Sep 29 '22 21:09 ntustison

Okay, great. Can you post the command line call and the resulting error message?

#!/bin/sh
#SBATCH -J label_propagation         # job name
#SBATCH -N 1                         # nodes requested
#SBATCH -n 1                         # tasks requested
#SBATCH --partition=defq             # default queue
#SBATCH -o outfile                   # send stdout to outfile
#SBATCH -e errfile                   # send stderr to errfile
#SBATCH -w mrphpcc014
#SBATCH --export=TMPDIR=/projects/autoseg-headneck-dl/tmp
source /projects/autoseg-headneck-dl/project_env/bin/activate
python3 labelProp_reg_seg.py -ti IR_187 -si IR_187 -bp /projects/autoseg-headneck-dl -tp template -ap target -sp segmentation

Hi there is no resulting error message since it just got stuck at ants.registration and no errors showing in the error file. I can finish the ants.registration when using head node but not on user node

mikami520 avatar Sep 29 '22 21:09 mikami520

I need to see the actual ants.registration call. In fact, please reproduce the error using just a single ants.registration call (i.e., not wrapped in any private scripts). You can set the verbose option to print the output to the screen.

ntustison avatar Sep 29 '22 21:09 ntustison

/projects/autoseg-headneck-dl/tmp Script is below:

import numpy as np
import os
import ants

def main():
    template = ants.image_read('/projects/autoseg-headneck-dl/template/IR_187.nii.gz')
    target = ants.image_read('/projects/autoseg-headneck-dl/target/IR_191.nii.gz')
    transform_forward = ants.registration(fixed=template, moving=target,
                                          type_of_transform="SyN", syn_metric="demons", reg_iterations=(80, 40, 0), verbose=True)
if __name__ == '__main__':
    main()

Results are here:

All_Command_lines_OK
Using single precision for computations.
=============================================================================
The composite transform comprises the following transforms (in order): 
  1. Center of mass alignment using fixed image: 0x12d9c210 and moving image: 0x1212b850 (type = Euler3DTransform)
=============================================================================
  Reading mask(s).
    Registration stage 0
      No fixed mask
      No moving mask
    Registration stage 1
      No fixed mask
      No moving mask
  number of levels = 4
  number of levels = 3
  fixed image: 0x12d9c210
  moving image: 0x1212b850
  fixed image: 0x12d9c210
  moving image: 0x1212b850
Dimension = 3
Number of stages = 2
Use Histogram Matching true
Winsorize image intensities false
Lower quantile = 0
Upper quantile = 1
Stage 1 State
   Image metric = Mattes
     Fixed image = Image (0x1304e250)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 961
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 933
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  Spacing: [0.461802, 0.461802, 0.461811]
  Origin: [-123.108, -3.66415, -200.044]
  Direction: 
0.999994 1.07282e-07 0.00345253
1.20058e-09 -1 3.07257e-05
-0.00345253 3.07255e-05 0.999994

  IndexToPointMatrix: 
0.461799 4.9543e-08 0.00159441
5.54431e-10 -0.461802 1.41895e-05
-0.00159438 1.41891e-05 0.461808

  PointToIndexMatrix: 
2.16542 2.59978e-09 -0.0074762
2.32311e-07 -2.16543 6.65339e-05
0.00747606 6.65331e-05 2.16538

  Inverse Direction: 
0.999994 1.20058e-09 -0.00345253
1.07282e-07 -1 3.07255e-05
0.00345253 3.07257e-05 0.999994

  PixelContainer: 
    ImportImageContainer (0x12a591c0)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 931
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x2aab66827010
      Container manages memory: true
      Size: 52166656
      Capacity: 52166656

     Moving image = Image (0x1304aa90)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 962
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 959
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  Spacing: [0.461802, 0.461802, 0.461809]
  Origin: [-135.718, -35.044, -266.268]
  Direction: 
0.999999 4.99905e-08 0.00169959
-1.18551e-05 -0.999975 0.00701237
-0.00169829 0.0070099 0.999974

  IndexToPointMatrix: 
0.461802 2.30857e-08 0.000784884
-5.47471e-06 -0.461791 0.00323837
-0.000784274 0.00323719 0.461797

  PointToIndexMatrix: 
2.16543 -2.56905e-05 -0.00368024
1.17464e-07 -2.16538 0.0151848
0.00367756 0.0151792 2.16534

  Inverse Direction: 
0.999999 -1.1864e-05 -0.00169955
5.42452e-08 -0.999975 0.00701238
0.00169833 0.00700989 0.999974

  PixelContainer: 
    ImportImageContainer (0x12f50fd0)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 957
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x2aab94000010
      Container manages memory: true
      Size: 45350912
      Capacity: 45350912

     Weighting = 1
     Sampling strategy = regular
     Number of bins = 32
     Radius = 4
     Sampling percentage  = 0.2
   Transform = Affine
     Gradient step = 0.25
     Update field sigma (voxel space) = 0
     Total field sigma (voxel space) = 0
     Update field time sigma = 0
     Total field time sigma  = 0
     Number of time indices = 0
     Number of time point samples = 0
Stage 2 State
   Image metric = Demons
     Fixed image = Image (0x1304ad50)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 1015
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 987
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 199]
  Spacing: [0.461802, 0.461802, 0.461811]
  Origin: [-123.108, -3.66415, -200.044]
  Direction: 
0.999994 1.07282e-07 0.00345253
1.20058e-09 -1 3.07257e-05
-0.00345253 3.07255e-05 0.999994

  IndexToPointMatrix: 
0.461799 4.9543e-08 0.00159441
5.54431e-10 -0.461802 1.41895e-05
-0.00159438 1.41891e-05 0.461808

  PointToIndexMatrix: 
2.16542 2.59978e-09 -0.0074762
2.32311e-07 -2.16543 6.65339e-05
0.00747606 6.65331e-05 2.16538

  Inverse Direction: 
0.999994 1.20058e-09 -0.00345253
1.07282e-07 -1 3.07255e-05
0.00345253 3.07257e-05 0.999994

  PixelContainer: 
    ImportImageContainer (0x12a52200)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 985
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x2aab9ed01010
      Container manages memory: true
      Size: 52166656
      Capacity: 52166656

     Moving image = Image (0x1304b010)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 1016
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 1013
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [512, 512, 173]
  Spacing: [0.461802, 0.461802, 0.461809]
  Origin: [-135.718, -35.044, -266.268]
  Direction: 
0.999999 4.99905e-08 0.00169959
-1.18551e-05 -0.999975 0.00701237
-0.00169829 0.0070099 0.999974

  IndexToPointMatrix: 
0.461802 2.30857e-08 0.000784884
-5.47471e-06 -0.461791 0.00323837
-0.000784274 0.00323719 0.461797

  PointToIndexMatrix: 
2.16543 -2.56905e-05 -0.00368024
1.17464e-07 -2.16538 0.0151848
0.00367756 0.0151792 2.16534

  Inverse Direction: 
0.999999 -1.1864e-05 -0.00169955
5.42452e-08 -0.999975 0.00701238
0.00169833 0.00700989 0.999974

  PixelContainer: 
    ImportImageContainer (0x12760b50)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 1011
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x2aabab402010
      Container manages memory: true
      Size: 45350912
      Capacity: 45350912

     Weighting = 1
     Sampling strategy = none
     Number of bins = 32
     Radius = 4
     Sampling percentage  = 1
   Transform = SyN
     Gradient step = 0.2
     Update field sigma (voxel space) = 3
     Total field sigma (voxel space) = 0
     Update field time sigma = 0
     Total field time sigma  = 0
     Number of time indices = 0
     Number of time point samples = 0
Registration using 2 total stages.

Stage 0
  iterations = 2100x1200x1200x0
  convergence threshold = 1e-06
  convergence window size = 10
  number of levels = 4
  using the Mattes MI metric (number of bins = 32, weight = 1)
  preprocessing:  histogram matching the images
  Shrink factors (level 1 out of 4): [4, 4, 4]
  Shrink factors (level 2 out of 4): [2, 2, 2]
  Shrink factors (level 3 out of 4): [2, 2, 2]
  Shrink factors (level 4 out of 4): [1, 1, 1]
  smoothing sigmas per level: [3, 2, 1, 0]
  regular sampling (percentage = 0.2)

*** Running AffineTransform registration ***

DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -2.813219428062e-01, inf, 1.7181e+01, 1.7181e+01, 
 2DIAGNOSTIC,     2, -2.844639718533e-01, inf, 1.7632e+01, 4.5115e-01, 
 2DIAGNOSTIC,     3, -2.904630601406e-01, inf, 1.8087e+01, 4.5444e-01, 
 2DIAGNOSTIC,     4, -3.007685244083e-01, inf, 1.8540e+01, 4.5373e-01, 
 2DIAGNOSTIC,     5, -3.183935284615e-01, inf, 1.8994e+01, 4.5346e-01, 
 2DIAGNOSTIC,     6, -3.493832051754e-01, inf, 1.9457e+01, 4.6357e-01, 
 2DIAGNOSTIC,     7, -4.168933033943e-01, inf, 1.9927e+01, 4.6960e-01, 
 2DIAGNOSTIC,     8, -5.424323081970e-01, inf, 2.0445e+01, 5.1769e-01, 
 2DIAGNOSTIC,     9, -6.123086214066e-01, inf, 2.1623e+01, 1.1789e+00, 
 2DIAGNOSTIC,    10, -6.124399900436e-01, 6.053167209029e-02, 2.2053e+01, 4.2914e-01, 
 2DIAGNOSTIC,    11, -6.135740280151e-01, 5.809864401817e-02, 2.2482e+01, 4.2980e-01, 
 2DIAGNOSTIC,    12, -6.156906485558e-01, 5.146318674088e-02, 2.2910e+01, 4.2716e-01, 
 2DIAGNOSTIC,    13, -6.190714240074e-01, 4.204098507762e-02, 2.3338e+01, 4.2835e-01, 
 2DIAGNOSTIC,    14, -6.254795193672e-01, 3.137146309018e-02, 2.3772e+01, 4.3427e-01, 
 2DIAGNOSTIC,    15, -6.388834118843e-01, 2.104608155787e-02, 2.4209e+01, 4.3645e-01, 
 2DIAGNOSTIC,    16, -6.665989160538e-01, 1.265268027782e-02, 2.4648e+01, 4.3902e-01, 
 2DIAGNOSTIC,    17, -6.937538981438e-01, 7.488992065191e-03, 2.5163e+01, 5.1531e-01, 
 2DIAGNOSTIC,    18, -7.076045274734e-01, 6.665471475571e-03, 2.5609e+01, 4.4602e-01, 
 2DIAGNOSTIC,    19, -7.342540025711e-01, 8.111903443933e-03, 2.6249e+01, 6.4006e-01, 
 2DIAGNOSTIC,    20, -7.409344315529e-01, 8.784332312644e-03, 2.6754e+01, 5.0549e-01, 
 2DIAGNOSTIC,    21, -7.413212060928e-01, 8.599292486906e-03, 2.7193e+01, 4.3819e-01, 
 2DIAGNOSTIC,    22, -7.437060475349e-01, 7.753079291433e-03, 2.7630e+01, 4.3740e-01, 
 2DIAGNOSTIC,    23, -7.443974614143e-01, 6.380322389305e-03, 2.8202e+01, 5.7184e-01, 
 2DIAGNOSTIC,    24, -7.445874214172e-01, 4.702513106167e-03, 2.8641e+01, 4.3884e-01, 
 2DIAGNOSTIC,    25, -7.453178763390e-01, 3.033364191651e-03, 2.9079e+01, 4.3822e-01, 
 2DIAGNOSTIC,    26, -7.473825216293e-01, 1.801685779355e-03, 2.9516e+01, 4.3700e-01, 
 2DIAGNOSTIC,    27, -7.498279809952e-01, 1.046763849445e-03, 3.0151e+01, 6.3523e-01, 
 2DIAGNOSTIC,    28, -7.509226202965e-01, 5.051401094534e-04, 3.0587e+01, 4.3616e-01, 
 2DIAGNOSTIC,    29, -7.514142394066e-01, 4.048690316267e-04, 3.1026e+01, 4.3881e-01, 
 2DIAGNOSTIC,    30, -7.520393133163e-01, 3.902903990820e-04, 3.1597e+01, 5.7106e-01, 
 2DIAGNOSTIC,    31, -7.523587346077e-01, 3.494879638311e-04, 3.2168e+01, 5.7095e-01, 
 2DIAGNOSTIC,    32, -7.525413632393e-01, 3.154476289637e-04, 3.2671e+01, 5.0248e-01, 
 2DIAGNOSTIC,    33, -7.532699108124e-01, 2.743243239820e-04, 3.3103e+01, 4.3283e-01, 
 2DIAGNOSTIC,    34, -7.621305584908e-01, 3.320676914882e-04, 3.3661e+01, 5.5771e-01, 
 2DIAGNOSTIC,    35, -7.738839387894e-01, 5.126833566464e-04, 3.4152e+01, 4.9115e-01, 
 2DIAGNOSTIC,    36, -7.797993421555e-01, 7.289141649380e-04, 3.4576e+01, 4.2410e-01, 
 2DIAGNOSTIC,    37, -7.869144082069e-01, 9.784795111045e-04, 3.5064e+01, 4.8739e-01, 
 2DIAGNOSTIC,    38, -7.890673279762e-01, 1.157402992249e-03, 3.5550e+01, 4.8627e-01, 
 2DIAGNOSTIC,    39, -7.902623414993e-01, 1.233585411683e-03, 3.6039e+01, 4.8862e-01, 
 2DIAGNOSTIC,    40, -7.916049361229e-01, 1.208236208186e-03, 3.6526e+01, 4.8770e-01, 
 2DIAGNOSTIC,    41, -7.923331856728e-01, 1.080825575627e-03, 3.7014e+01, 4.8799e-01, 
 2DIAGNOSTIC,    42, -7.923943996429e-01, 8.621836313978e-04, 3.7502e+01, 4.8751e-01, 
 2DIAGNOSTIC,    43, -7.924139499664e-01, 5.809061694890e-04, 3.7925e+01, 4.2297e-01, 
 2DIAGNOSTIC,    44, -7.925332784653e-01, 3.460146544967e-04, 3.8542e+01, 6.1733e-01, 
 2DIAGNOSTIC,    45, -7.925949692726e-01, 2.025420981226e-04, 3.9160e+01, 6.1760e-01, 
 2DIAGNOSTIC,    46, -7.925862669945e-01, 1.004639780149e-04, 3.9647e+01, 4.8742e-01, 
 2DIAGNOSTIC,    47, -7.925720810890e-01, 5.803947715322e-05, 4.0072e+01, 4.2509e-01, 
 2DIAGNOSTIC,    48, -7.925559282303e-01, 3.104559800704e-05, 4.0561e+01, 4.8896e-01, 
 2DIAGNOSTIC,    49, -7.925549745560e-01, 1.247694399353e-05, 4.1115e+01, 5.5370e-01, 
 2DIAGNOSTIC,    50, -7.925552129745e-01, 4.771909516421e-06, 4.1538e+01, 4.2308e-01, 
 2DIAGNOSTIC,    51, -7.925056815147e-01, 2.613708602439e-06, 4.2028e+01, 4.8965e-01, 
 2DIAGNOSTIC,    52, -7.925310134888e-01, 1.096330947803e-06, 4.2452e+01, 4.2409e-01, 
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -7.333807349205e-01, inf, 6.5371e+01, 2.2920e+01, 
 2DIAGNOSTIC,     2, -7.333887815475e-01, inf, 6.9358e+01, 3.9869e+00, 
 2DIAGNOSTIC,     3, -7.333895564079e-01, inf, 7.3810e+01, 4.4521e+00, 
 2DIAGNOSTIC,     4, -7.333821058273e-01, inf, 7.7327e+01, 3.5165e+00, 
 2DIAGNOSTIC,     5, -7.333791255951e-01, inf, 8.1313e+01, 3.9858e+00, 
 2DIAGNOSTIC,     6, -7.333782911301e-01, inf, 8.4828e+01, 3.5155e+00, 
 2DIAGNOSTIC,     7, -7.333780527115e-01, inf, 8.8338e+01, 3.5093e+00, 
 2DIAGNOSTIC,     8, -7.333742380142e-01, inf, 9.2321e+01, 3.9837e+00, 
 2DIAGNOSTIC,     9, -7.333755493164e-01, inf, 9.5835e+01, 3.5132e+00, 
 2DIAGNOSTIC,    10, -7.333749532700e-01, 2.013426637859e-06, 9.9354e+01, 3.5196e+00, 
 2DIAGNOSTIC,    11, -7.333688735962e-01, 1.497073981227e-06, 1.0288e+02, 3.5218e+00, 
 2DIAGNOSTIC,    12, -7.333683967590e-01, 1.437878722754e-06, 1.0780e+02, 4.9230e+00, 
 2DIAGNOSTIC,    13, -7.333684563637e-01, 1.526806272523e-06, 1.1085e+02, 3.0499e+00, 
 2DIAGNOSTIC,    14, -7.333676218987e-01, 1.456138988942e-06, 1.1438e+02, 3.5273e+00, 
 2DIAGNOSTIC,    15, -7.333676218987e-01, 1.392485728502e-06, 1.1836e+02, 3.9842e+00, 
 2DIAGNOSTIC,    16, -7.333675026894e-01, 1.382393861604e-06, 1.2188e+02, 3.5167e+00, 
 2DIAGNOSTIC,    17, -7.333671450615e-01, 1.418763986294e-06, 1.2540e+02, 3.5187e+00, 
 2DIAGNOSTIC,    18, -7.333673238754e-01, 1.411779294358e-06, 1.2891e+02, 3.5139e+00, 
 2DIAGNOSTIC,    19, -7.333672046661e-01, 1.461526039748e-06, 1.3289e+02, 3.9850e+00, 
 2DIAGNOSTIC,    20, -7.333611249924e-01, 1.400061137247e-06, 1.3688e+02, 3.9844e+00, 
 2DIAGNOSTIC,    21, -7.333610653877e-01, 1.252205038327e-06, 1.4039e+02, 3.5158e+00, 
 2DIAGNOSTIC,    22, -7.333609461784e-01, 1.130404143623e-06, 1.4391e+02, 3.5149e+00, 
 2DIAGNOSTIC,    23, -7.333610057831e-01, 1.049445700119e-06, 1.4743e+02, 3.5161e+00, 
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -6.754842996597e-01, inf, 1.7379e+02, 2.6364e+01, 
 2DIAGNOSTIC,     2, -6.754698157310e-01, inf, 1.7738e+02, 3.5926e+00, 
 2DIAGNOSTIC,     3, -6.754652857780e-01, inf, 1.8051e+02, 3.1294e+00, 
 2DIAGNOSTIC,     4, -6.754665374756e-01, inf, 1.8412e+02, 3.6037e+00, 
 2DIAGNOSTIC,     5, -6.754652857780e-01, inf, 1.8818e+02, 4.0670e+00, 
 2DIAGNOSTIC,     6, -6.754544973373e-01, inf, 1.9226e+02, 4.0750e+00, 
 2DIAGNOSTIC,     7, -6.754555702209e-01, inf, 1.9585e+02, 3.5952e+00, 
 2DIAGNOSTIC,     8, -6.754534840584e-01, inf, 1.9945e+02, 3.6002e+00, 
 2DIAGNOSTIC,     9, -6.754514575005e-01, inf, 2.0447e+02, 5.0126e+00, 
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -6.335421204567e-01, inf, 2.6203e+02, 5.7569e+01, 
  Elapsed time (stage 0): 2.7086e+02


Stage 1
  iterations = 80x40x0
  convergence threshold = 1.0000e-07
  convergence window size = 8
  number of levels = 3
  using the Demons metric (weight = 1.0000e+00)
  preprocessing:  histogram matching the images
  Shrink factors (level 1 out of 3): [4, 4, 4]
  Shrink factors (level 2 out of 3): [2, 2, 2]
  Shrink factors (level 3 out of 3): [1, 1, 1]
  smoothing sigmas per level: [2, 1, 0]
  Using default NONE metricSamplingStrategy 

*** Running SyN registration (varianceForUpdateField = 3.0000e+00, varianceForTotalField = 0.0000e+00) ***

XXDIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 1DIAGNOSTIC,     1, 6.578126922250e-03, inf, 2.0323e+01, 2.0323e+01, 
 1DIAGNOSTIC,     2, 6.198876537383e-03, inf, 2.4173e+01, 3.8498e+00, 
 1DIAGNOSTIC,     3, 5.858600605279e-03, inf, 2.8237e+01, 4.0645e+00, 
 1DIAGNOSTIC,     4, 5.561163183302e-03, inf, 3.2309e+01, 4.0722e+00, 
 1DIAGNOSTIC,     5, 5.306666716933e-03, inf, 3.6383e+01, 4.0737e+00, 
 1DIAGNOSTIC,     6, 5.093042273074e-03, inf, 4.0517e+01, 4.1334e+00, 
 1DIAGNOSTIC,     7, 4.908734932542e-03, inf, 4.4583e+01, 4.0664e+00, 
 1DIAGNOSTIC,     8, 4.750644788146e-03, 2.683306112885e-02, 4.8675e+01, 4.0924e+00, 
 1DIAGNOSTIC,     9, 4.614279139787e-03, 2.094817720354e-02, 5.2757e+01, 4.0811e+00, 
 1DIAGNOSTIC,    10, 4.489870741963e-03, 1.648759841919e-02, 5.6855e+01, 4.0987e+00, 
 1DIAGNOSTIC,    11, 4.376789554954e-03, 1.316211745143e-02, 6.1046e+01, 4.1903e+00, 
 1DIAGNOSTIC,    12, 4.273286089301e-03, 1.071105618030e-02, 6.5238e+01, 4.1922e+00, 
 1DIAGNOSTIC,    13, 4.179042298347e-03, 8.898347616196e-03, 6.9456e+01, 4.2183e+00, 
 1DIAGNOSTIC,    14, 4.094402771443e-03, 7.513524964452e-03, 7.3767e+01, 4.3108e+00, 
 1DIAGNOSTIC,    15, 4.016762599349e-03, 6.434714421630e-03, 7.8104e+01, 4.3370e+00, 
 1DIAGNOSTIC,    16, 3.946381155401e-03, 5.559334997088e-03, 8.2436e+01, 4.3318e+00, 
 1DIAGNOSTIC,    17, 3.877554554492e-03, 4.836757667363e-03, 8.6772e+01, 4.3366e+00, 
 1DIAGNOSTIC,    18, 3.813719609752e-03, 4.242306575179e-03, 9.1089e+01, 4.3168e+00, 
 1DIAGNOSTIC,    19, 3.756330581382e-03, 3.744673682377e-03, 9.5530e+01, 4.4412e+00, 
 1DIAGNOSTIC,    20, 3.696973901242e-03, 3.349890699610e-03, 9.9974e+01, 4.4436e+00, 
 1DIAGNOSTIC,    21, 3.639972535893e-03, 3.038992406800e-03, 1.0442e+02, 4.4469e+00, 
 1DIAGNOSTIC,    22, 3.586101345718e-03, 2.785658929497e-03, 1.0889e+02, 4.4715e+00, 
 1DIAGNOSTIC,    23, 3.535091876984e-03, 2.572475699708e-03, 1.1344e+02, 4.5518e+00, 
 1DIAGNOSTIC,    24, 3.487749956548e-03, 2.376453252509e-03, 1.1804e+02, 4.5942e+00, 
 1DIAGNOSTIC,    25, 3.443661611527e-03, 2.197061199695e-03, 1.2249e+02, 4.4565e+00, 
 1DIAGNOSTIC,    26, 3.402639180422e-03, 2.024127170444e-03, 1.2696e+02, 4.4690e+00, 
 1DIAGNOSTIC,    27, 3.365724114701e-03, 1.841247314587e-03, 1.3143e+02, 4.4681e+00, 
 1DIAGNOSTIC,    28, 3.328551771119e-03, 1.674149301834e-03, 1.3603e+02, 4.5994e+00, 
 1DIAGNOSTIC,    29, 3.290897700936e-03, 1.533278496936e-03, 1.4062e+02, 4.5934e+00, 
 1DIAGNOSTIC,    30, 3.255779622123e-03, 1.415321836248e-03, 1.4521e+02, 4.5803e+00, 
 1DIAGNOSTIC,    31, 3.221598919481e-03, 1.320389797911e-03, 1.4979e+02, 4.5814e+00, 
 1DIAGNOSTIC,    32, 3.188668750226e-03, 1.242642058060e-03, 1.5449e+02, 4.6993e+00, 
 1DIAGNOSTIC,    33, 3.158426145092e-03, 1.172390300781e-03, 1.5919e+02, 4.6997e+00, 
 1DIAGNOSTIC,    34, 3.129208460450e-03, 1.105471397750e-03, 1.6390e+02, 4.7116e+00, 
 1DIAGNOSTIC,    35, 3.101278329268e-03, 1.034480519593e-03, 1.6862e+02, 4.7260e+00, 
 1DIAGNOSTIC,    36, 3.072457155213e-03, 9.692877065390e-04, 1.7334e+02, 4.7195e+00, 
 1DIAGNOSTIC,    37, 3.045512363315e-03, 9.127996745519e-04, 1.7808e+02, 4.7382e+00, 
 1DIAGNOSTIC,    38, 3.020318225026e-03, 8.602522430010e-04, 1.8290e+02, 4.8203e+00, 
 1DIAGNOSTIC,    39, 2.994478680193e-03, 8.158217533492e-04, 1.8773e+02, 4.8321e+00, 
 1DIAGNOSTIC,    40, 2.972299000248e-03, 7.722107111476e-04, 1.9278e+02, 5.0499e+00, 
 1DIAGNOSTIC,    41, 2.948460634798e-03, 7.316091796383e-04, 1.9772e+02, 4.9358e+00, 
 1DIAGNOSTIC,    42, 2.926714951172e-03, 6.910879747011e-04, 2.0281e+02, 5.0922e+00, 
 1DIAGNOSTIC,    43, 2.902568085119e-03, 6.570639670826e-04, 2.0774e+02, 4.9337e+00, 
 1DIAGNOSTIC,    44, 2.881729742512e-03, 6.274841725826e-04, 2.1280e+02, 5.0596e+00, 
 1DIAGNOSTIC,    45, 2.860515611246e-03, 6.020836881362e-04, 2.1775e+02, 4.9451e+00, 
 1DIAGNOSTIC,    46, 2.839983208105e-03, 5.777952028438e-04, 2.2269e+02, 4.9454e+00, 
 1DIAGNOSTIC,    47, 2.822571666911e-03, 5.520061822608e-04, 2.2774e+02, 5.0475e+00, 
 1DIAGNOSTIC,    48, 2.804308431223e-03, 5.219482118264e-04, 2.3269e+02, 4.9486e+00, 
 1DIAGNOSTIC,    49, 2.787145320326e-03, 4.920735955238e-04, 2.3774e+02, 5.0532e+00, 
 1DIAGNOSTIC,    50, 2.770079998299e-03, 4.614342469722e-04, 2.4281e+02, 5.0612e+00, 
 1DIAGNOSTIC,    51, 2.754636341706e-03, 4.352111136541e-04, 2.4800e+02, 5.1913e+00, 
 1DIAGNOSTIC,    52, 2.739454386756e-03, 4.096098709852e-04, 2.5317e+02, 5.1742e+00, 
 1DIAGNOSTIC,    53, 2.725338097662e-03, 3.861439181492e-04, 2.5835e+02, 5.1801e+00, 
 1DIAGNOSTIC,    54, 2.710175234824e-03, 3.681863890961e-04, 2.6364e+02, 5.2939e+00, 
 1DIAGNOSTIC,    55, 2.695507137105e-03, 3.510272072162e-04, 2.6907e+02, 5.4278e+00, 
 1DIAGNOSTIC,    56, 2.679283265024e-03, 3.407034964766e-04, 2.7437e+02, 5.2966e+00, 
 1DIAGNOSTIC,    57, 2.665875712410e-03, 3.310627362225e-04, 2.7971e+02, 5.3409e+00, 
 1DIAGNOSTIC,    58, 2.655114978552e-03, 3.183979424648e-04, 2.8511e+02, 5.3995e+00, 
 1DIAGNOSTIC,    59, 2.641850383952e-03, 3.055073611904e-04, 2.9050e+02, 5.3923e+00, 
 1DIAGNOSTIC,    60, 2.631305018440e-03, 2.886845613830e-04, 2.9604e+02, 5.5333e+00, 
 1DIAGNOSTIC,    61, 2.618490718305e-03, 2.722183126025e-04, 3.0154e+02, 5.5034e+00, 
 1DIAGNOSTIC,    62, 2.606740221381e-03, 2.580231230240e-04, 3.0726e+02, 5.7240e+00, 
 1DIAGNOSTIC,    63, 2.595708938316e-03, 2.456069341861e-04, 3.1272e+02, 5.4526e+00, 
 1DIAGNOSTIC,    64, 2.584427362308e-03, 2.392040478298e-04, 3.1847e+02, 5.7558e+00, 
 1DIAGNOSTIC,    65, 2.574067562819e-03, 2.334236924071e-04, 3.2389e+02, 5.4178e+00, 
 1DIAGNOSTIC,    66, 2.563637681305e-03, 2.246854273835e-04, 3.2975e+02, 5.8567e+00, 
 1DIAGNOSTIC,    67, 2.554278355092e-03, 2.163149620173e-04, 3.3549e+02, 5.7426e+00, 
 1DIAGNOSTIC,    68, 2.545834053308e-03, 2.036867663264e-04, 3.4112e+02, 5.6312e+00, 
 1DIAGNOSTIC,    69, 2.535880310461e-03, 1.940799993463e-04, 3.4708e+02, 5.9605e+00, 
 1DIAGNOSTIC,    70, 2.525974297896e-03, 1.868436083896e-04, 3.5330e+02, 6.2159e+00, 
 1DIAGNOSTIC,    71, 2.517260145396e-03, 1.799753517844e-04, 3.5983e+02, 6.5310e+00, 
 1DIAGNOSTIC,    72, 2.509122714400e-03, 1.738253922667e-04, 3.6633e+02, 6.5068e+00, 
 1DIAGNOSTIC,    73, 2.500902162865e-03, 1.677125401329e-04, 3.7284e+02, 6.5044e+00, 
 1DIAGNOSTIC,    74, 2.492375439033e-03, 1.627938327147e-04, 3.7935e+02, 6.5126e+00, 
 1DIAGNOSTIC,    75, 2.483481075615e-03, 1.586566650076e-04, 3.8603e+02, 6.6772e+00, 
 1DIAGNOSTIC,    76, 2.473445143551e-03, 1.559523952892e-04, 3.9281e+02, 6.7781e+00, 
 1DIAGNOSTIC,    77, 2.465295605361e-03, 1.542438112665e-04, 3.9956e+02, 6.7515e+00, 
 1DIAGNOSTIC,    78, 2.455571200699e-03, 1.556767238071e-04, 4.0629e+02, 6.7351e+00, 
 1DIAGNOSTIC,    79, 2.447751350701e-03, 1.554992486490e-04, 4.1314e+02, 6.8526e+00, 
 1DIAGNOSTIC,    80, 2.436448354274e-03, 1.574752386659e-04, 4.2013e+02, 6.9856e+00, 
XXDIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 1DIAGNOSTIC,     1, 3.196502570063e-03, inf, 4.9789e+02, 7.7760e+01, 
 1DIAGNOSTIC,     2, 3.092057071626e-03, inf, 5.5467e+02, 5.6775e+01, 
 1DIAGNOSTIC,     3, 2.978311618790e-03, inf, 6.1268e+02, 5.8011e+01, 
 1DIAGNOSTIC,     4, 2.883156761527e-03, inf, 6.7050e+02, 5.7823e+01, 

It is still in the progress and I send you the output of current stage

mikami520 avatar Sep 29 '22 21:09 mikami520

The SyN step took a lot of time and it is weird that it is fast on my local laptop

mikami520 avatar Sep 29 '22 22:09 mikami520

This sounds to me like resource restrictions on the jobs as being submitted to SLURM (low CPU count and RAM). I suggest contacting your HPC support for help with requesting appropriate resources.

gdevenyi avatar Sep 29 '22 22:09 gdevenyi

Are you controlling the number of threads? If not, the job might try to use all the CPUs in the HPC, which can make things much slower. Try setting the environment variable ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=N, where your job reserves N CPUs, before running the Python code.

cookpa avatar Sep 29 '22 22:09 cookpa

Are you controlling the number of threads? If not, the job might try to use all the CPUs in the HPC, which can make things much slower. Try setting the environment variable ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=N, where your job reserves N CPUs, before running the Python code.

@cookpa Hi, based on your suggestion, I add one argument to my bash file:

#!/bin/sh
#SBATCH -J label_propagation         # job name
#SBATCH -N 1                         # nodes requested
#SBATCH -n 1                         # tasks requested
#SBATCH --partition=defq             # default queue
#SBATCH -o outfile                   # send stdout to outfile
#SBATCH -e errfile                   # send stderr to errfile
#SBATCH -w mrphpcc014
#SBATCH --export=TMPDIR=/projects/autoseg-headneck-dl/tmp
#SBATCH --export ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=$SLURM_CPUS_PER_TASK

source /projects/autoseg-headneck-dl/project_env/bin/activate
python3 test_temp.py

Ans use sbatch sbatch --cpus-per-task=8 unittest.sh to run, but it is always in the Pending Status instead of Running, do you have any idea about this?

mikami520 avatar Sep 29 '22 22:09 mikami520

This sounds to me like resource restrictions on the jobs as being submitted to SLURM (low CPU count and RAM). I suggest contacting your HPC support for help with requesting appropriate resources.

The work is done and can finish the job but taking long time. I agreed with your guess about the restricted CPU sources.

mikami520 avatar Sep 29 '22 22:09 mikami520

Ans use sbatch sbatch --cpus-per-task=8 unittest.sh to run, but it is always in the Pending Status instead of Running, do you have any idea about this?

This is exactly the time to ask your HPC support, this is unrelated to ANTs

gdevenyi avatar Sep 29 '22 23:09 gdevenyi