non-zero exitstatus (with target-indexed?)
Hi,
In 2.1.11 and 2.1.12, when running SingleM I get segmentation faults running DIAMOND as a subprocess.
https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR7151488&display=metadata
pixi run singlem pipe --forward ~/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --otu-table /dev/stdout
06/18/2025 07:46:25 AM INFO: SingleM v0.19.0
06/18/2025 07:46:25 AM INFO: Retrieval successful. Location of backpack is: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb
06/18/2025 07:46:26 AM INFO: Loaded 59 SingleM packages
06/18/2025 07:46:26 AM INFO: Using as input 1 different sequence files e.g. /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz
06/18/2025 07:46:28 AM INFO: Filtering sequence files through DIAMOND blastx 06/18/2025 07:47:22 AM ERROR: Process (DIAMOND?) failed Traceback (most recent call last):
File "/mnt/hpccs01/home/woodcrob/git/singlem/.pixi/envs/default/bin/singlem", line 10, in <module>
sys.exit(main()) ^^^^^^
File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/main.py", line 741, in main singlem.pipe.SearchPipe().run( File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 73, in run
otu_table_object = self.run_to_otu_table(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 402, in run_to_otu_table raise e File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 395, in run_to_otu_table
self._num_threads, self._working_directory).run_diamond( ^^^^^^^^^^^^
File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/diamond_spkg_searcher.py", line 34, in run_diamond
fwds = self._prefilter(dmnd, forward_read_files, False, performance_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/diamond_spkg_searcher.py", line 95, in _prefilter
qseqid_sseqid = extern.run(cmd)
^^^^^^^^^^^^^^^
File "/mnt/hpccs01/home/woodcrob/git/singlem/.pixi/envs/default/lib/python3.12/site-packages/extern/__init__.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd | tee >(sed 's/^/>/; s/\t/\n/; s/\t.*//' > /tmp/tmps0pbyx0u/prefilter_forward/20110600_E1D.1.fna) | awk '{print $1,$3}' returned non-zero exit status 139.
STDERR was: b"bash: line 1: 375636 Segmentation fault (core dumped) diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd\n 375637 Done | tee >(sed 's/^/>/; s/\\t/\\n/; s/\\t.*//' > /tmp/tmps0pbyx0u/prefilter_forward/20110600_E1D.1.fna)\n 375638 Done | awk '{print $1,$3}'\n"STDOUT was: b''
Drilling down, DIAMOND appears to be failing (but maybe it was bash that was segfaulting since that doesn't show up?) e.g.
(conda)cpu1n022:20250618:~/git/singlem$ pixi run diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd -o test.out
diamond v2.1.12.166 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 1
Opening the database... [0.005s]
Database: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd (type: Diamond database, sequences: 105024, letters: 8081808)
Block size = 500000000
Opening the input file... [0.17s]
Opening the output file... [0s]
Loading query sequences... [4.831s]
Masking queries... [26.176s]
Algorithm: Double-indexed
Seeking in database... [0s]
Loading reference sequences... [0.01s]
Masking reference... [0.338s]
Initializing temporary storage... [0.012s]
Building reference histograms... [0.141s]
Allocating buffers... [0s]
Loading database seed index... [0.011s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array... [0.055s]
Building query seed array... (conda)cpu1n022:20250618:~/git/singlem$ echo $?
1
Removing some options
$ pixi run diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd -o test.out
diamond v2.1.12.166 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 1
Opening the database... [0.006s]
Database: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd (type: Diamond database, sequences: 105024, letters: 8081808)
Block size = 2000000000
Opening the input file... [0.17s]
Opening the output file... [0s]
Loading query sequences... [19.764s]
Masking queries... [104.82s]
Algorithm: Double-indexed
Building query histograms... [56.316s]
Seeking in database... [0s]
Loading reference sequences... [0.009s]
Masking reference... [0.339s]
Initializing temporary storage... [0.031s]
Building reference histograms... [0.239s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array... [0.099s]
Building query seed array... [23.402s]
Computing hash join... [4.114s]
Masking low complexity seeds... [0.096s]
Searching alignments... [0.586s]
Deallocating memory... [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
...
Computing alignments... [0.867s]
Deallocating reference... [0s]
Loading reference sequences... [0s]
Deallocating buffers... [0s]
Deallocating queries... [0.002s]
Loading query sequences... [0s]
Closing the input file... [0s]
Closing the output file... [0.007s]
Closing the database... [0s]
Cleaning up... [0s]
Total time = 480.815s
Reported 38659 pairwise alignments, 38659 HSPs.
38659 queries aligned.
I wonder if this is a backwards incompatibility of the index files? They were created with 2.1.10 I am pretty sure.
Thanks in advance, ben
fwiw the specific dataset is here, but it shows up for lots of (not tiny?) samples https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR7151488&display=metadata
Looks like there is a problem with --target-indexed. It will be fixed. What's your reason for using it? It was made for very particular applications only.
We found it was faster for our particular use case (a quite small DB). Thanks for the fix.
It should be fixed in the latest release. The index format has never been changed.