relion icon indicating copy to clipboard operation
relion copied to clipboard

CTF estimation creates symlinks with absolute paths

Open DimitriosBellos opened this issue 10 months ago • 10 comments

Dear Relion developers,

Hi, my name is Dimitrios Bellos and I am a member of the AI & I team in the Rosalind Franklin Institute. Our team help with supporting our Franklin RELION users with issues.

Recently we spotted that there are issues arising from the fact that the CTF estimation step creates symlinks to the Motion corrected data using absolute paths.

Example in CTFFind/job003/ 'Position_99_035[-61_00]_EER_PS.mrc' -> '/<absolute-path>/MotionCorr/job002/<data-directory>/Position_99_035[-61_00]_EER_PS.mrc'

This can cause issues if the whole Relion Project directory is moved. This is common because a whole Relion Project directory may be moved from our compute infrastructure to Baskerville HPC and vise versa. Is it possible to make changes so relative symlinks are created ? example 'Position_99_035[-61_00]_EER_PS.mrc' -> '/../../../MotionCorr/job002/<data-directory>/Position_99_035[-61_00]_EER_PS.mrc' You can even generate the relative path using the realpath command (see here https://stackoverflow.com/questions/2564634/convert-absolute-path-into-relative-path-given-a-current-directory-using-bash )

Altenatively, can it even be done so no symlinks are used?

Kind regards, Dimitrios Bellos

DimitriosBellos avatar Apr 22 '24 10:04 DimitriosBellos

This can cause issues if the whole Relion Project directory is moved.

I doubt this. These links are created by a CTFFIND job and used only by the job itself. Thus, unless you move the project directory before the job completes, it should be fine. Am I missing other failure modes?

biochem-fan avatar Apr 22 '24 23:04 biochem-fan

[-61_00]

Oh, this is STA, not SPA. I know nothing about the STA workflow. STA related issues need to be dealt with by others.

biochem-fan avatar Apr 22 '24 23:04 biochem-fan

To help this is the script we run on the HPC

#!/bin/bash

#SBATCH --qos=rfi
#SBATCH --account=<account-name>
#SBATCH --time=0-01:00:00
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=4
#SBATCH --gpus-per-task=1

module purge
module load baskerville
module load RELION

# Import (relion)
mkdir -p Import/job001
time relion_import  --do_movies  --optics_group_name "opticsGroup1" --angpix 1.85 --kV 300 --Cs 2.7 --Q0 0.1 --beamtilt_x 0 --beamtilt_y 0 --i "data/HeLa_argon/Position_*.eer" --odir Import/job001/ --ofile movies.star --pipeline_control Import/job001/

# Motion correction (relion)
time srun `which relion_run_motioncorr_mpi` --i Import/job001/movies.star --o MotionCorr/job002/ --first_frame_sum 1 --last_frame_sum -1 --use_own  --j 4 --float16 --bin_factor 1 --bfactor 150 --dose_per_frame 0.14 --preexposure 0 --patch_x 5 --patch_y 5 --eer_grouping 32 --gain_rot 0 --gain_flip 0 --dose_weighting  --grouping_for_ps 29   --pipeline_control MotionCorr/job002/

# CTF correction (relion)
time srun `which relion_run_ctffind_mpi` --i MotionCorr/job002/corrected_micrographs.star --o CtfFind/job003/ --Box 512 --ResMin 30 --ResMax 5 --dFMin 5000 --dFMax 50000 --FStep 500 --dAst 100 --ctffind_exe ctffind --ctfWin -1 --is_ctffind4  --fast_search  --use_given_ps   --pipeline_control CtfFind/job003/

DimitriosBellos avatar Apr 23 '24 12:04 DimitriosBellos

The problem is after the completion of CTF estimation process. A directory is created in the <RELION-project-directory-name>/CTFFind/job003/ location that has the same structure as the data directory that exists on the RELION project directory level (<RELION-project-directory-name>/data/HeLa_argon/). It looks like this <RELION-project-directory-name>/CTFFind/job003/data/HeLa_argon/ and in it many symlinks are created.

If these symlinks are not longer needed after the completion of the CTF estimation process, can you please add a step to delete them after the CTF estimation is completed?

If they are needed even after the CTF estimation is completed, can you change the code so they are created using relative paths and not absolute paths. This way the symlinks will not break even if the whole RELION project directory is moved elsewhere.

It is a minor issue, but if the symlinks need to remain there after the CTF estimation completes, then having them in a form that they cannot break if the entire project directory is moved will be very useful.

DimitriosBellos avatar Apr 23 '24 12:04 DimitriosBellos

The symlinks are not used after the job completion as far as SPA is concerned. I think (not confirmed) they get deleted when a user "Gentle Clean" the job from the GUI.

can you please add a step to delete them after the CTF estimation is completed?

This is a valid suggestion but because it is harmless (and nobody complained for at least five years), my priority is low. A pull request is welcomed.

biochem-fan avatar Apr 23 '24 13:04 biochem-fan

No problem, we can perform the symlink delete part, if the delete of the links is supposed to be executed by the GUI.

We are currectly writing production scripts so a slurn script submitted to an HPC will perform a sequence of processes one after the other automatically. For this reason, we are running RELION solely using commands.

It might be a good idea to add in the documentation, for those that run RELION only via commands, that any symlinks created by CTFFind it is OK to delete them after CTFFind finishes.

Happy if you close the issue-ticket

DimitriosBellos avatar Apr 24 '24 07:04 DimitriosBellos

FYI:

  • Gentle clean can be invoked from the command line: relion_pipeliner --gentle_clean
  • Did you check relion_it.py based on RELION Schemes? By using this (or relion_piperliner), the job history is created properly. Thus, a user can open the GUI on the output folder of your automatic processing pipeline and inspect what has been performed and continue data processing.

Unfortunately I cannot help with the latter because I don't use the feature myself.

biochem-fan avatar Apr 24 '24 08:04 biochem-fan

Just want to confirm that STA behaves in the exact same way as SPA here. The PS.mrc files get generated during motioncorr and are only temporarily symlinked. Yes, deleting them would be cleaner, but this should not cause any issues.

scheres avatar Apr 30 '24 08:04 scheres

Hello, I also have the same problem, when using shell script in ctf, it will create a full path in ctffind directory, I do not know how to solve it now.

At the same time, when I do ctf, there is an error “ERROR: Failed to make a symlink from A to B”, but the symlink already exists under ctfFind, but the error is still displayed. How do you solve it?

xinsheng44 avatar May 04 '24 13:05 xinsheng44

The symlinks are not used after the job completion as far as SPA is concerned. I think (not confirmed) they get deleted when a user "Gentle Clean" the job from the GUI.

can you please add a step to delete them after the CTF estimation is completed?

This is a valid suggestion but because it is harmless (and nobody complained for at least five years), my priority is low. A pull request is welcomed.

I tested with SPA and STA and both had the same problem,

xinsheng44 avatar May 04 '24 13:05 xinsheng44