nextflow
nextflow copied to clipboard
symlinks for staged files in directories are not removed in .command.run
Bug report
For a failed job in my Nextflow pipeline, I'm manually running bash .command.run and I'm getting ln: failed to create symbolic link 'DIRECTORY_NAME/FILE_NAME.txt': File exists.
The nxf_stage() function includes:
nxf_stage() {
true
# stage input files
mkdir -p 164164 && ln -s /home/nickyoungblut/tmp/work/ad/d0abcbad4c7b9137844e3ba48c8af4/KAPA_mRNA-enrichment_HumanRefRNA_500ng_1e-2dilution_20240417_C01_R1_001/summary.txt 164164/summary.txt
mkdir -p 6262 && ln -s /home/nickyoungblut/tmp/work/53/f75d0ff2aa376187fafae661a5b400/DJv3_NT1_ctrl_rep1_031524_R1_001/fastqc_data.txt 6262/fastqc_data.txt
mkdir -p 284284 && ln -s /home/nickyoungblut/tmp/work/be/b0b67f565d4ae5a0230e453acaa236/DJv2_FTH1_kd_rep2_031524_R2_001/fastqc_data.txt 284284/fastqc_data.txt
[...]
}
The symlinks are not removed via rm -f prior to recreating them in the nxf_stage() function, and ln -s is used instead of ln -sf. This results in the error when manually re-running .command.run. This make troubleshooting failed jobs harder, since I manually have to delete existing symlinks or comment-out all of the ln -s commands in nxf_stage().
This issue does not occur for files not in staged directories, just for mkdir -p new_directory && ln -s new_directory/new_file.txt.
Expected behavior and actual behavior
See above
Steps to reproduce the problem
This should occur for any pipeline that creates staged files in directories: mkdir -p new_directory && ln -s new_directory/new_file.txt
Program output
See above
Environment
- Nextflow version: 23.10.1
- Java version: openjdk 21
- Operating system: Linux
- Bash version: 5.2.15
Additional context
This is likely because you have many staged files, see here
https://github.com/nextflow-io/nextflow/blob/aa9e127373de3bc0b4b78640279336cdd6d003aa/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy#L122-L133
Thanks @pditommaso for pointing that out! What is the problem with including possibly a few 1000 more lines in the runner script?
It's explained in the comment. To contain the script file size. You can delete all symlink using a Bash oneliner like find . -type l -delete or something similar
Why does the file size need to be contained to <100 lines of removing symlinks? Extending to 1000's of lines will not add much size to the file.
You can delete all symlink using a Bash oneliner like
find . -type l -deleteor something similar
Why not just use find . -type l -delete instead of removing each symlink individually in the runner script?
lol. need to think it there could be other links. @bentsherman opinion?
need to think it there could be other links
I thought all symlinks were (re)created by the runner script, but maybe I'm mistaken?
Deleting all links should be fine, I can't think of any other links that are created. But Nick also suggested using ln -sf instead of deleting the links, maybe that would be better
I just hit this problem too when using nextflow -resume:
Command exit status:
1
Command output:
(empty)
Command wrapper:
ln: failed to create symbolic link 'prop_summary.json': File exists
not sure if I understand the comments above, this just looks like a bug?