diamond non-zero exitstatus (with target-indexed?)

Hi,

In 2.1.11 and 2.1.12, when running SingleM I get segmentation faults running DIAMOND as a subprocess.

https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR7151488&display=metadata

pixi run singlem pipe --forward ~/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --otu-table /dev/stdout
06/18/2025 07:46:25 AM INFO: SingleM v0.19.0
06/18/2025 07:46:25 AM INFO: Retrieval successful. Location of backpack is: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb
06/18/2025 07:46:26 AM INFO: Loaded 59 SingleM packages
06/18/2025 07:46:26 AM INFO: Using as input 1 different sequence files e.g. /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz
06/18/2025 07:46:28 AM INFO: Filtering sequence files through DIAMOND blastx                                                                                                  06/18/2025 07:47:22 AM ERROR: Process (DIAMOND?) failed                                                                                                                       Traceback (most recent call last):
  File "/mnt/hpccs01/home/woodcrob/git/singlem/.pixi/envs/default/bin/singlem", line 10, in <module>
    sys.exit(main())                                                                                                                                                                       ^^^^^^
  File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/main.py", line 741, in main                                                                                                singlem.pipe.SearchPipe().run(                                                                                                                                              File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 73, in run
    otu_table_object = self.run_to_otu_table(**kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 402, in run_to_otu_table                                                                                    raise e                                                                                                                                                                     File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/pipe.py", line 395, in run_to_otu_table
    self._num_threads, self._working_directory).run_diamond(                                                                                                                                                                  ^^^^^^^^^^^^
  File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/diamond_spkg_searcher.py", line 34, in run_diamond
    fwds = self._prefilter(dmnd, forward_read_files, False, performance_parameters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/hpccs01/home/woodcrob/git/singlem/singlem/diamond_spkg_searcher.py", line 95, in _prefilter
    qseqid_sseqid = extern.run(cmd)
                    ^^^^^^^^^^^^^^^
  File "/mnt/hpccs01/home/woodcrob/git/singlem/.pixi/envs/default/lib/python3.12/site-packages/extern/__init__.py", line 41, in run
    raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd | tee >(sed 's/^/>/; s/\t/\n/; s/\t.*//' > /tmp/tmps0pbyx0u/prefilter_forward/20110600_E1D.1.fna) | awk '{print $1,$3}' returned non-zero exit status 139.
STDERR was: b"bash: line 1: 375636 Segmentation fault      (core dumped) diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd\n     375637 Done                    | tee >(sed 's/^/>/; s/\\t/\\n/; s/\\t.*//' > /tmp/tmps0pbyx0u/prefilter_forward/20110600_E1D.1.fna)\n     375638 Done                    | awk '{print $1,$3}'\n"STDOUT was: b''

Drilling down, DIAMOND appears to be failing (but maybe it was bash that was segfaulting since that doesn't show up?) e.g.

(conda)cpu1n022:20250618:~/git/singlem$ pixi run diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --block-size 0.5 --target-indexed -c1 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd -o test.out
diamond v2.1.12.166 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 1
Opening the database...  [0.005s]
Database: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd (type: Diamond database, sequences: 105024, letters: 8081808)
Block size = 500000000
Opening the input file...  [0.17s]
Opening the output file...  [0s]
Loading query sequences...  [4.831s]
Masking queries...  [26.176s]
Algorithm: Double-indexed
Seeking in database...  [0s]
Loading reference sequences...  [0.01s]
Masking reference...  [0.338s]
Initializing temporary storage...  [0.012s]
Building reference histograms...  [0.141s]
Allocating buffers...  [0s]
Loading database seed index...  [0.011s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array...  [0.055s]
Building query seed array... (conda)cpu1n022:20250618:~/git/singlem$ echo $?
1

Removing some options

$ pixi run diamond blastx --outfmt 6 qseqid full_qseq sseqid --max-target-seqs 1 --evalue 0.01 --min-orf 24 --query-gencode 4 --threads 1 --query /home/woodcrob/m/abisko/data/flat20150213/20110600_E1D.1.fq.gz --db /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd -o test.out
diamond v2.1.12.166 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
#Target sequences to report alignments for: 1
Opening the database...  [0.006s]
Database: /mnt/hpccs01/home/woodcrob/git/singlem/db/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb/payload_directory/prefilter.fna.dmnd (type: Diamond database, sequences: 105024, letters: 8081808)
Block size = 2000000000
Opening the input file...  [0.17s]
Opening the output file...  [0s]
Loading query sequences...  [19.764s]
Masking queries...  [104.82s]
Algorithm: Double-indexed
Building query histograms...  [56.316s]
Seeking in database...  [0s]
Loading reference sequences...  [0.009s]
Masking reference...  [0.339s]
Initializing temporary storage...  [0.031s]
Building reference histograms...  [0.239s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array...  [0.099s]
Building query seed array...  [23.402s]
Computing hash join...  [4.114s]
Masking low complexity seeds...  [0.096s]
Searching alignments...  [0.586s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
...
Computing alignments...  [0.867s]
Deallocating reference...  [0s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0.002s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0.007s]
Closing the database...  [0s]
Cleaning up...  [0s]
Total time = 480.815s
Reported 38659 pairwise alignments, 38659 HSPs.
38659 queries aligned.

I wonder if this is a backwards incompatibility of the index files? They were created with 2.1.10 I am pretty sure.

Thanks in advance, ben

Jun 17 '25 22:06 wwood

fwiw the specific dataset is here, but it shows up for lots of (not tiny?) samples https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR7151488&display=metadata

Jun 17 '25 22:06 wwood

Looks like there is a problem with --target-indexed. It will be fixed. What's your reason for using it? It was made for very particular applications only.

Jun 19 '25 06:06 bbuchfink

We found it was faster for our particular use case (a quite small DB). Thanks for the fix.

Jun 19 '25 07:06 wwood

It should be fixed in the latest release. The index format has never been changed.

Jul 23 '25 11:07 bbuchfink