ANTs icon indicating copy to clipboard operation
ANTs copied to clipboard

Unexpected behavior of antsMultivariateTemplateConstruction2.sh

Open valeryozenne opened this issue 3 years ago • 11 comments

Hi,

I can't find a solution to the following issue, if you have any clues , thanks in advance.

I'm building a template with 'antsMultivariateTemplateConstruction2.sh' with 12 sets of images . For some reasons, one job (always the same one) is not put in queue but this is not fully reproducible if you try it several times.

Capture d’écran de 2022-04-21 10-19-42

Closely,

  • job11_r.sh has been edited but has the not the same permission level , so chmod wasn't used
  • job_11_0_metriclog.txt has been created
  • job_11_metriclog.txt doesn't exist as the job was not launched.
  • the proper file and hard drive location of job11_r.sh is not corrupted by reading/writing

Capture d’écran de 2022-04-21 10-20-18

I'm running my computation on a local computer, the command line is the following:

logCmd antsMultivariateTemplateConstruction2.sh -d 3 -i 4 -k 2 -w 0.5x1 -c 2 -j 6 ${MINC_WINTOUT_SLASH} -t SyN  -n 0 -m CC -r 1 -o ${FICHIER_TEMPLATE}  liste_de_fichier_copiee_ici_${NOW}.csv 

I didn't find any issue before the call of the jobs but it could be possible that something wrong trigger this ?

I can share the data if necessary. Thanks in advance, Best regards, Valéry

valeryozenne avatar Apr 21 '22 08:04 valeryozenne

What happens when you try with antsMultivariateTemplateConstruction.sh?

ntustison avatar Apr 21 '22 13:04 ntustison

Just a hunch, but I somewhat suspect a problem with the periods in your output directory path.

Can you reproduce the problem if there are no periods in the directory or file names before the file extension?

cookpa avatar Apr 21 '22 14:04 cookpa

Thanks for your advice. I did additional tests but I cannot solve or even clearly isolate the problem.

  • Removing all the dots in the directory or file names does not change the issue.

  • The error occur if I increase the number of images (N=24) (see screenshots). (job 23 is missing)

  • Changing the number of cores seems to help. I was using " -c 2 -j 6 " , if I change this option to " -c 2 -j 8 ", the issue persists but with less occurrence

  • surprisingly, if I do chmod +x on job11.sh and relaunch 'antsMultivariateTemplateConstruction2.sh', I get a template but no output of the Generic.mat / Warp/ inverseWarp files for the corresponding volume.

  • If I use 'antsMultivariateTemplateConstruction.sh' , I have additionnal issues but I'm not familiar with 'antsMultivariateTemplateConstruction.sh' (see errors below) ? so it could be my fault.

  • I'm now using another computer with an older version of ants . I let you know.

using N=24

Capture d’écran de 2022-04-26 10-19-21

using antsMultivariateTemplateConstruction.sh

--------------------------------------------------------------------------------------
 Starting ANTS rigid registration on max 6 cpucores. 
 Progress can be viewed in /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job*_metriclog.txt
--------------------------------------------------------------------------------------
Using max 6 parallel threads
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job0_r.sh
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job10_r.sh
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job11_r.sh
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job1_r.sh
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job2_r.sh
Running sh /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/job3_r.sh
AFFINE: /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/rigid2_0_S12_TI2_reoriented_N4_resampled_to_0Affine.txt
moving_image_filename: /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/S12_TI2_reoriented_N4_resampled_to_0.5.nii.gz components 1
output_image_filename: /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/rigid2_0_S12_TI2_reoriented_N4_resampled_to_0.5.nii.gz
reference_image_filename: /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/MYtemplate0.nii.gz
[0/1]: AFFINE: /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/rigid2_0_S12_TI2_reoriented_N4_resampled_to_0Affine.txt
User Linear interpolation 
HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
  #000: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5Fdeprec.c line 156 in itk_H5Fis_hdf5(): unable to determine if file is accessible as HDF5
    major: File accessibility
    minor: Not an HDF5 file
  #001: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5VLcallback.c line 3769 in itk_H5VL_file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #002: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5VLcallback.c line 3699 in H5VL__file_specific(): file specific failed
    major: Virtual Object Layer
    minor: Can't operate on object
  #003: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5VLnative_file.c line 384 in itk_H5VL__native_file_specific(): error in HDF5 file check
    major: File accessibility
    minor: Unable to initialize object
  #004: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5Fint.c line 1073 in itk_H5F__is_hdf5(): unable to open file
    major: File accessibility
    minor: Unable to initialize object
  #005: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5FD.c line 723 in itk_H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #006: /home/vozenne/Dev/antsInstallExample/build/ITKv5/Modules/ThirdParty/HDF5/src/itkhdf5/src/H5FDsec2.c line 352 in H5FD__sec2_open(): unable to open file: name = '/workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/rigid2_0_S12_TI2_reoriented_N4_resampled_to_0Affine.txt', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibility
    minor: Unable to open file
Exception caught during WarpImageMultiTransform.

[...]

/home/vozenne/Dev/antsInstallExample/install/bin/antsMultivariateTemplateConstruction.sh: line 278: 2701277 Segmentation fault      (core dumped) ${ANTSPATH}/AverageImages $dim $output 2 ${images[@]}
summarizeimageset: ERROR - output file /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/MYtemplate0.nii.gz could not be created
ERROR: command exited with nonzero status 1
Command: antsMultivariateTemplateConstruction.sh -d 3 -i 4 -k 1 -c 2 -j 6 -m 225x75x25 -t GR -n 0 -s CC -r 1 -o /workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/Processing_ANTs//Results_Template/Template_Debug_12_Ti2//Resolution_05/Syn_Template_05/MY liste_de_fichier_copiee_ici_04_26_2022_43_07.csv

valeryozenne avatar Apr 26 '22 11:04 valeryozenne

Can you try this example? templateCommandMultivariateBSplineSyN.sh from here

https://github.com/ntustison/TemplateBuildingExample/blob/master/BrainSlices/templateCommandMultivariateBSplineSyN.sh

cookpa avatar Apr 26 '22 13:04 cookpa

It works well. I also tested with a parallel call using the following -c 2 -j 6. So either something is wrong with my data or my script. I keep looking.

valeryozenne avatar Apr 26 '22 16:04 valeryozenne

looks like this system might be configured in french, can you export LC_ALL=C to override the localization settings and see if that fixes things?

gdevenyi avatar Apr 26 '22 16:04 gdevenyi

Good suggestion @gdevenyi

I am really puzzled by the intermittent nature of the problem. Is it possible that a disk is filling up or a quota is being enforced?

cookpa avatar Apr 26 '22 16:04 cookpa

My other thought was something malformed in the CSV? (we haven't seen it). Some of my users helpfully make their files on OSX/windows and end up with broken line endings.

gdevenyi avatar Apr 26 '22 16:04 gdevenyi

Yes, Python defaults to Excel style for CSV, which uses Windows newlines regardless of the system.

Some other suggestions:

  1. Ensure each run starts fresh, don't re-run over existing output. Let us know if the problem is reproducible that way.
  2. You can try bash -x antsMultivariateTemplateConstruction2.sh ... | tee debugLog.txt to enable debug mode. This will print a lot of information to the terminal, but might yield some clues

cookpa avatar Apr 26 '22 17:04 cookpa

Thanks a lot for all suggestions. Here is my status:

I'm currently lost ! Indeed, before trying the suggestion of @gdevenyi . I re-run the script , and I cannot currently reproduce the problem anymore. But I have no idea why.

The .csv were fine. Here is the output of "locale" command:

vozenne@bigcalculo:/workspace_QMRI/PROJECTS_DATA/2022_RECH_Template_Bruker/CODE_ANTs$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=

`` So I have some templates to build in the next days. I keep you notice when if it is coming back. I might have some future question about parcellation.

valeryozenne avatar Apr 26 '22 17:04 valeryozenne

Related to the CSV file, I can confirm that if I make an imagelist.csv the old fashioned way on Mac OS (definitely no Windows newlines), I can run

${ANTSPATH}/antsMultivariateTemplateConstruction2.sh \
  -d 2 \
  -o ${outputPath}T_ \
  -i 4 \
  -g 0.2 \
  -j 4 \
  -c 2 \
  -k 2 \
  -w 1x1 \
  -f 8x4x2x1 \
  -s 3x2x1x0 \
  -q 100x70x50x10 \
  -n 1 \
  -r 1 \
  -l 1 \
  -m CC[2] \
  -t BSplineSyN[0.1,26,0] \
  imagelist.csv

correctly on the brain slices.

cookpa avatar Apr 26 '22 17:04 cookpa