DiaNN DIANN error big dataset

Hi @vdemichev I'm trying to run a large experiment with quantms. In the latest step of quantms, I got the following error:

pst_prd@codon-dm-05:/hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD039236$ tail -n 500 -f work/40/916f1229262d23cf0064ad40e0ef38/assemble_empirical_library.log 
DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 15 2022 08:45:18
Current date and time: Thu Jan 11 19:17:15 2024
Logical CPU cores: 48
Thread number set to 48
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Existing .quant files will be used
A fast algorithm will be used to select the MS2 mass accuracy setting
Mass accuracy will be determined separately for different runs
Scan windows will be inferred separately for different runs
A spectral library will be generated
DIA-NN will optimise the mass accuracy separately for each run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

15620 files will be processed
[0:00] Loading spectral library lib.predicted.speclib
[0:04] Library annotated with sequence database(s): Homo-sapiens-uniprot-reviewed-entrap-contaminants-202310.fasta
[0:04] Protein names missing for some isoforms
[0:04] Gene names missing for some isoforms
[0:04] Library contains 20676 proteins, and 20183 genes
[0:06] Spectral library loaded: 41081 protein isoforms, 65581 protein groups and 7397781 precursors in 3688943 elution groups.
[0:06] Initialising library

[0:32] Cross-run analysis
[0:32] Reading quantification information: 15620 files
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert

The command it really large because the analysis is in more than 15K files. Here the summary of the command:

diann {all the mzML files} 
        --lib lib.predicted.speclib \
        --threads 48 \
        --out-lib empirical_library.tsv \
        --verbose 3 \
        --rt-profiling \
        --temp ./quant/ \
        --use-quant \
        --quick-mass-acc --individual-mass-acc \
        --individual-windows \
        --gen-spec-lib \
         \
        2>&1 | tee assemble_empirical_library.log

Jan 11 '24 19:01 ypriverol

Could be out of memory.

Jan 11 '24 20:01 vdemichev

This is DIA-based lib creation step, can do this based on a subset of runs

Jan 11 '24 20:01 vdemichev

This is DIA-based lib creation step, can do this based on a subset of runs

@vdemichev can you suggest a smart way of selecting the subset of runs?

Jan 12 '24 08:01 ypriverol

With this number of runs, I would just recommend selecting at random

Jan 12 '24 11:01 vdemichev

@vdemichev I will close the issue when we implement your suggestion in quantms and see if we can finish the dataset.

Jan 12 '24 11:01 ypriverol

DiaNN DiaNN copied to clipboard

DIANN error big dataset

DiaNN
DiaNN copied to clipboard