Bowtie2 "find" process fails to read symlinks when mounted in CIFS
Description of the bug
Hello,
I am running TaxProfiler on WSL2, with the work and results directories located in a NAS mounted using CIFS. The work directory must be located in the NAS because it becomes too big to be stored locally. However, when running the pipeline (see the command below), it gets stuck in the Bowtie2 align step of host removal. The reason is that the command find -L ./ -name "*.rev.1.bt2" | sed "s/\.rev.1.bt2$//" fails to read the symlink that is created within the work directory for that process, pointing to the bowtie2 index files previously computed by the pipeline. I believe that the reason why this happens is that find itself fails to read symlinks on a mounted NAS, depending on the protocol and mount command used. I already checked that the symlink exists and that the index files are foundable there, so it is the command that fails, not the files being absent.
However, if instead of using find I change that line (line 53 of bowtie2/align/main.nf module) to something like INDEX=\$(ls -1 bowtie2/*.rev.1.bt2 2>/dev/null | sed 's/\\.rev\\.1\\.bt2\$//') then it reads the symlink and stores the information in the variable $INDEX as expected and the pipeline proceeds as normal. I do not know the implications of this in the overall pipeline, but from my humble knowledge, I dare to suggest changing the find command for an ls-based option in order to prevent this error from happening. This would need to be changed in lines 53 and 54 of the script modules/nf-core/bowtie2/align/main.nf. I do not know if this happens with any other processes within TaxProfiler other than the ones I tested myself. If so, find would need to be changed and tested there too.
Otherwise, can you come up with another general solution when find fails to read the symlink created in the work directory and returns empty?
The command used:
nextflow run nf-core/taxprofiler -profile docker -resume --input /mnt/NAS/taxprofiler_files/sample_sheet.csv --databases /mnt/NAS/taxprofiler_files/db_file.csv --outdir /mnt/NAS/Taxprofiler_PROJECT_01092025 --run_motus --perform_shortread_qc --perform_runmerging --save_runmerged_reads --save_analysis_ready_fastqs --shortread_qc_mergepairs --perform_shortread_hostremoval --save_hostremoval_bam --save_hostremoval_unmapped --run_profile_standardisation --hostremoval_reference /mnt/NAS/general_resources/T2T-CHM13_v2.0/hs1.fa -work-dir /mnt/NAS/work_03092025
Thank you for your time,
Best regards
Samuel
Command used and terminal output
nextflow run nf-core/taxprofiler -profile docker -resume --input /mnt/NAS/taxprofiler_files/sample_sheet.csv --databases /mnt/NAS/taxprofiler_files/db_file.csv --outdir /mnt/NAS/Taxprofiler_PROJECT_01092025 --run_motus --perform_shortread_qc --perform_runmerging --save_runmerged_reads --save_analysis_ready_fastqs --shortread_qc_mergepairs --perform_shortread_hostremoval --save_hostremoval_bam --save_hostremoval_unmapped --run_profile_standardisation --hostremoval_reference /mnt/NAS/general_resources/T2T-CHM13_v2.0/hs1.fa -work-dir /mnt/NAS/work_03092025
---------------------------------------------------
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/taxprofiler] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:BOWTIE2_ALIGN (TB007)'
Caused by:
Process `NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:BOWTIE2_ALIGN (TB007)` terminated with an error exit status (1)
Command executed:
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\.rev.1.bt2$//"`
[ -z "$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\.rev.1.bt2l$//"`
[ -z "$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1
bowtie2 \
-x $INDEX \
-U TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.merged.fastq.gz \
--threads 12 \
--un-gz TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped.fastq.gz \
\
2>| >(tee TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.bowtie2.log >&2) \
| samtools sort --threads 12 -o TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.bam -
if [ -f TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped.fastq.1.gz ]; then
mv TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped.fastq.1.gz TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped_1.fastq.gz
fi
if [ -f TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped.fastq.2.gz ]; then
mv TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped.fastq.2.gz TB007_TB007_EKDN250015205-1A_22VGFWLT4_L6.unmapped_2.fastq.gz
fi
cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:BOWTIE2_ALIGN":
bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//')
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
Bowtie2 index files not found
Work dir:
/mnt/NAS/work_03092025/cf/266a7cc6486141a9978677e32eaed9
Container:
community.wave.seqera.io/library/bowtie2_htslib_samtools_pigz:edeb13799090a2a6
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
Relevant files
No response
System information
Nextflow version: 25.04.6 Executor: local OS: WSL2 running on Windows 11 Version: nf-core/taxprofiler 1.2.4 (revision: 1385e2f895 [master])
Hi @sghuete !
I just want to acknowledge receipt of the issue and I've not missed it.
I want to read your issue properly before responding, because it seems like a actually more of a Nextflow issue (Windows/WSL plus a Nas is very much not a typical set up), but I'm currently travelling this week (sending this from my phone).
I'll try and get back to you ASAP
Hi @sghuete sorry that it has taken so long to get back to you - I was traveling the last two weeks thus it was hard to find time to sit down and focus on it.
Your issue and your proposed solution (and thank you for investigating so thoroughly and finding a possible one!), actually has much larger ramifications. The find command system is actually present across MANY nf-core modules (and thus pipelines). So if we switch to a different system, we need to update at lesat 33 different modules: https://github.com/search?q=repo%3Anf-core%2Fmodules%20%60find%20-L&type=code
I am going to propose switching of the system to the community, and see what they say.
One thing though that like I said above, WSL/Windows based systems is quite uncommon for high-throughput bioinformatics and is not ideal - even Nextflow doesn't really support CIFS directly. So I think you have a sort of edge case here... but to me your solution is relatively straightforward so it might be OK.
I'll report back once I've heard back from the wider community (if you are on the nf-core slack, please let me know your handle there so I can ping you in the discussion for you to follow)
Hello @jfy133,
Thanks for your answer. Yes, I knew this would affect other nf-core modules. If I am fully honest, I do not understand why the problem is restricted to the bowtie2 module in my case and does not affect any other modules. I just know that, by modifying line 53 of the bowtie2/align/main.nf script as above, it worked perfectly fine and the pipeline completed successfully. Why did find work for other modules but not for bowtie2? No idea.
I am aware as well that my setup is not quite canonical, but it is also true that WSL users are increasing, and this change should not affect the performance of the pipelines in native Linux environments or any other bash-based setup.
Furthermore, I thought a bit on using find vs ls options, and I think that find -L needs to first lookup/resolve symlinks (via stat()/readlink()) to follow them and look for the target, which may return errors (EIO, permissions, etc.). This won't happen with ls because ls will still list the symlink itself (via lstat(), though possibly also with limited info) without resolving the target or following across mounts. Even if the symlink target is inaccessible or across filesystem boundaries, ls still shows the name of the symlink (the directory entry). Also, I did not test this, but I figure ls might have a better overall performance when handling complex directories because find needs to "look" for the file while ls + sed just "captures" the pattern, which is what the pipeline needs to work.
Anyways, you guys know way more than me about Nextflow and nf-core so I am happy to discuss it further or find any other solution to this find command problem. I am on the nf-core slack and my handle is U08PT555XGU.
Thank you for your time.
Best regards
Samuel
Hi @sghuete !
Thanks for getting back to me.
We are trying to explore groovy methods rather than bash maybe get around this method entirely (but I ran out of time today). I will try to get back to it next week (away tomorrow).
Otherwise I could not find you on the nf-core slack nor with that ID... here is a direct link to the (long) thread: https://nfcore.slack.com/archives/C043UU89KKQ/p1759999518188429 .
In the meantime @sghuete someone has suggested (via an LLM):
sudo mount -t cifs -o vers=3.0,mfsymlinks,username=user,password=pass //nas_server/share_name /mnt/nas_mount_point
May allow you to use find with WSL?
Hello @jfy133,
My mount options for CIFS were:
sudo mount -t cifs //nas_server/share_name /mnt/nas_mount_point -o vers=3.0,username=user,mfsymlinks,uid=$(id -u),gid=$(id -g),file_mode=0777,dir_mode=0777
So, virtually, the same options with a little twist on the user ID/group ID and file modes to prevent permission and access errors. And these are the options that gave rise to my problem, so find did not resolve the symlinks even with such a configuration. Again, I am not sure as to why it failed just with the find command within Bowtie2 and not the others, so I believe it must be a bit more complex than just CIFS options.
Best regards
Samuel
OK, thanks for following up. I will try and continue experimenting and discussing with Jim and Mahesh :)
Absent a solution using Groovy/Nextflow to list files, can I suggest that the solution should be to be more explicit in Nextflow with inputs/outputs?
BOWTIE2_INDEX:
output:
tuple val(meta), val(basename), val(file1), val(file2) <all index files individually>, emit: index
script:
basename = <groovy code that generates the basename>
BOWTIE2_ALIGN:
input:
tuple val(meta), val(basename), val(file1), val(file2) <all index files individually>
script:
"""
bowtie2 align --reference ${basename}
"""
This is more work to fix every module, but is more explicit (good!) and avoids problems like the index being a directory with unknown files in it, or a list of files, and explicilty takes in the required basename of the index as an entry.
Hello @jfy133,
Do you have any updates on this issue? With my current solution (ls-based) I cannot update my nf-core pipelines to the latest versions; otherwise, I would need to manually modify all the find commands in there.
Thank you for your time,
Best regards
Samuel
The current status is that there is a possible(!) fix in the works at the Nextflow level:
- Issue: https://github.com/nextflow-io/nextflow/issues/6488
- PR: https://github.com/nextflow-io/nextflow/pull/6581
But until it's merged I won't be able to test, and even then we need ot wait for a stable Nextflow release.
I completely missed the suggestion from Jim above, which could be one option (if you would be willing to update the modules in nf-core/modules @sghuete ), but very much not ideal as I believe some of the tools will actually generic different numbers of index files based on different things... so it would not be portable and not ideal.
Unfortunately my coding time is extremely limited last few weeks so I've not been able to explore any further, sorry about that :/
Aha it was just merged on the nextflow end, so we will be able to test it soon @sghuete !
@jfy133 Cool, thanks! Please, do keep me posted!