kmdiff
kmdiff copied to clipboard
Kmtricks does not process partitions
Hello, I am rtying to run kmdiff on our cluster but it difficult. I notably encounter this strange behavior: kmtricks processes super kmers and counts partitions but does not write them on disk, leading to this output that at first seems normal:
[2023-11-02 10:08:37.742] [info] exec: kmtricks pipeline --file /mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData/Stage_0_vs_controls/kmdiff_fof.txt --run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp --kmer-size 31 --hard-min 1 --threads 50 --minimizer-type 0 --mode kmer:count:bin --recurrence-min 1 --minimizer-size 10 --repartition-type 0 --nb-partitions 2 --cpr --until count --hist
[2023-11-02 10:08:37.814] [info] Run with Kmer<32> - uint64_t implementation
[2023-11-02 10:08:37.876] [info] Compute configuration...
[2023-11-02 10:08:37.879] [info] 324 samples found (324 read files).
[2023-11-02 10:09:42.269] [info] Use 4 partitions.
[2023-11-02 10:09:42.271] [info] Compute minimizer repartition...
Compute SuperK [==================================================] [03h:09m:18s]
Count partitions [==================================================] [03h:09m:19s]
[2023-11-02 13:23:06.444] [info] Done in 03h14m28s - Peak RSS -> 78326.06 MB.
[2023-11-02 13:23:07.315] [info] kmtricks exit normally with (0).
[2023-11-02 13:23:08.662] [warning] -c/--correction benjamini: all significants k-mers will live in memory.
[2023-11-02 13:23:08.700] [info] Done in 00s, Peak RSS -> 18 MB.
When looking at the files:
guelou01@vls136:/mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData$ ls /mnt/ssd/LM/Stage_0_vs_controls/* -lh
-rw-r--r-- 1 guelou01 guelou01 0 Nov 1 11:03 /mnt/ssd/LM/Stage_0_vs_controls/case_kmers.fasta
-rw-r--r-- 1 guelou01 guelou01 0 Nov 1 11:03 /mnt/ssd/LM/Stage_0_vs_controls/control_kmers.fasta
-rw-r--r-- 1 guelou01 guelou01 37 Nov 2 13:23 /mnt/ssd/LM/Stage_0_vs_controls/options.bin
/mnt/ssd/LM/Stage_0_vs_controls/partitions:
total 0
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 09:52 p4_uncorrected
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 09:52 p5_uncorrected
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 09:52 p6_uncorrected
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 00:29 p7_uncorrected
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 00:29 p8_uncorrected
-rw-r--r-- 1 guelou01 guelou01 0 Nov 2 00:29 p9_uncorrected
/mnt/ssd/LM/Stage_0_vs_controls/tmp:
total 152K
-rw-r--r-- 1 guelou01 guelou01 604 Nov 2 10:08 build_infos.txt
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:09 config_gatb
drwxr-xr-x 6 guelou01 guelou01 4.0K Nov 2 10:09 counts
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:08 filters
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:08 fpr
-rw-r--r-- 1 guelou01 guelou01 36 Nov 2 10:09 hash.info
drwxr-xr-x 2 guelou01 guelou01 12K Nov 2 13:23 histograms
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:08 howde_index
-rw-r--r-- 1 guelou01 guelou01 303 Nov 2 13:23 kmdiff-count.opt
-rwxrwx--- 1 guelou01 guelou01 60K Nov 2 10:08 kmtricks.fof
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:08 matrices
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:08 merge_infos
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:12 minimizers
-rw-r--r-- 1 guelou01 guelou01 507 Nov 2 10:08 options.txt
drwxr-xr-x 2 guelou01 guelou01 12K Nov 2 13:21 partition_infos
drwxr-xr-x 2 guelou01 guelou01 4.0K Nov 2 10:12 repartition_gatb
-rw-r--r-- 1 guelou01 guelou01 43 Nov 2 13:23 run_infos.txt
drwxr-xr-x 326 guelou01 guelou01 12K Nov 2 13:10 superkmers
Relevant part of the script that starts kmdiff:
singularity run --bind "/mnt/scratch/,/mnt/ssd/LM/,/mnt/projects_tn01/KMER_analysis/,/mnt/loreal_tn04/projects/LORE053_MICRA/shotgun_loreal/preprocessing/good/" /mnt/projects_tn01/KMER_analysis/tools/kmdiff.sif count --file /mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData/Stage_0_vs_controls/kmdiff_fof.txt --run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp --threads 50 --nb-partitions 2
singularity run --bind "/mnt/ssd/LM/,/mnt/scratch/,/mnt/projects_tn01/KMER_analysis/,/mnt/loreal_tn04/projects/LORE053_MICRA/shotgun_loreal/preprocessing/good/" /mnt/projects_tn01/KMER_analysis/tools/kmdiff.sif diff --km-run /mnt/ssd/LM/Stage_0_vs_controls/tmp --output-dir /mnt/ssd/LM/Stage_0_vs_controls/ --nb-controls 73 --nb-cases 251 --threads 50 --correction benjamini
The singularity is build in ubuntu 20.04 with this. Build infos:
kmtricks v1.2.0
- HOST -
build host: Linux-5.19.0-46-generic
run host: Linux 6.1.0-12-amd64
- BUILD -
c compiler: GNU 9.4.0
cxx compiler: GNU 9.4.0
conda: OFF
static: OFF
native: ON
modules: OFF
socks: OFF
howde: OFF
dev: OFF
kmer: 32,64,96,128
max_c: 4294967295
- GIT SHA1 / VERSION -
kmtricks: 5d34ca5
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01
Contact: [email protected]
So far I try to restart the diff command to get it to process the partitions. I sometimes also get a few Gb in 3 files and then kmdiff crashes with a sigsev(11). I sometimes also get lucky and it yields a result without error. Help is greatly appreciated. Cheers !
On a side note, I tried starting again the run several times. kmdiff restarts by processing the partitions and can complete the process, but with different results as shown here:
guelou01@vls136:/mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData$ bash ../../gitlab/Kmdiff.sh -f Stage_0_vs_controls/kmdiff_fof.txt -o /mnt/ssd/LM/Stage_0_vs_controls/ -t 100 -nc 251 -nC 73
WARNING: /mnt/ssd/LM/Stage_0_vs_controls directory already exists.
[2023-11-19 12:52:53.750] [info] exec: kmtricks pipeline --file /mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData/Stage_0_vs_controls/kmdiff_fof.txt --run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp --kmer-size 31 --hard-min 1 --threads 100 --minimizer-type 0 --mode kmer:count:bin --recurrence-min 1 --minimizer-size 10 --repartition-type 0 --nb-partitions 6 --cpr --until count --hist
[CheckFailedError] -> [--run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp] ~ Directory already exists!
[2023-11-19 12:52:53.794] [error] ExternalExecFailed - /opt/kmdiff/kmdiff_build/bin/kmtricks exit with 1.
[2023-11-19 12:52:54.142] [warning] -c/--correction benjamini: all significants k-mers will live in memory.
[2023-11-19 12:52:54.146] [info] Process partitions
[2023-11-19 16:50:04.260] [info] Partitions processed (03h57m10s) [03h:57m:09s]
[2023-11-19 16:50:04.260] [info] 147241945/4237569736 significant k-mers.
[2023-11-19 16:50:04.260] [info] Before correction: 73807788 (control), 73434157 (case).
[2023-11-19 16:50:04.262] [info] Aggregate partitions and apply significance correction...
[2023-11-19 16:50:04.319] [error] Killed after receive Segmentation fault:SIGSEGV(11) signal. Demangled backtrace dumped at ./kmdiff_backtrace.log. If the problem persists, please open an issue with the return of 'kmdiff infos' and the content of ./kmdiff_backtrace.log
[2023-11-19 16:50:04.320] [error] Killed after receive Segmentation fault:SIGSEGV(11) signal. Demangled backtrace dumped at ./kmdiff_backtrace.log. If the problem persists, please open an issue with the return of 'kmdiff infos' and the content of ./kmdiff_backtrace.log
guelou01@vls136:/mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData$ bash ../../gitlab/Kmdiff.sh -f Stage_0_vs_controls/kmdiff_fof.txt -o /mnt/ssd/LM/Stage_0_vs_controls/ -t 100 -nc 251 -nC 73
WARNING: /mnt/ssd/LM/Stage_0_vs_controls directory already exists.
[2023-11-20 09:48:38.560] [info] exec: kmtricks pipeline --file /mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData/Stage_0_vs_controls/kmdiff_fof.txt --run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp --kmer-size 31 --hard-min 1 --threads 100 --minimizer-type 0 --mode kmer:count:bin --recurrence-min 1 --minimizer-size 10 --repartition-type 0 --nb-partitions 6 --cpr --until count --hist
[CheckFailedError] -> [--run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp] ~ Directory already exists!
[2023-11-20 09:48:38.616] [error] ExternalExecFailed - /opt/kmdiff/kmdiff_build/bin/kmtricks exit with 1.
[2023-11-20 09:48:39.223] [warning] -c/--correction benjamini: all significants k-mers will live in memory.
[2023-11-20 09:48:39.233] [info] Done in 00s, Peak RSS -> 22 MB.
guelou01@vls136:/mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData$ bash ../../gitlab/Kmdiff.sh -f Stage_0_vs_controls/kmdiff_fof.txt -o /mnt/ssd/LM/Stage_0_vs_controls/ -t 100 -nc 251 -nC 73
WARNING: /mnt/ssd/LM/Stage_0_vs_controls directory already exists.
[2023-11-20 09:48:53.114] [info] exec: kmtricks pipeline --file /mnt/projects_tn01/KMER_analysis/DATA/ColorectalCancerData/Stage_0_vs_controls/kmdiff_fof.txt --run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp --kmer-size 31 --hard-min 1 --threads 100 --minimizer-type 0 --mode kmer:count:bin --recurrence-min 1 --minimizer-size 10 --repartition-type 0 --nb-partitions 6 --cpr --until count --hist
[CheckFailedError] -> [--run-dir /mnt/ssd/LM/Stage_0_vs_controls/tmp] ~ Directory already exists!
[2023-11-20 09:48:53.166] [error] ExternalExecFailed - /opt/kmdiff/kmdiff_build/bin/kmtricks exit with 1.
[2023-11-20 09:48:53.658] [warning] -c/--correction benjamini: all significants k-mers will live in memory.
[2023-11-20 09:48:53.662] [info] Process partitions
[2023-11-20 13:47:15.513] [info] Partitions processed (03h58m21s) [03h:58m:21s]
[2023-11-20 13:47:15.513] [info] 441098215/12824941157 significant k-mers.
[2023-11-20 13:47:15.513] [info] Before correction: 221073697 (control), 220024518 (case).
[2023-11-20 13:47:15.514] [info] Aggregate partitions and apply significance correction...
[2023-11-20 13:47:15.551] [error] Killed after receive Segmentation fault:SIGSEGV(11) signal. Demangled backtrace dumped at ./kmdiff_backtrace.log. If the problem persists, please open an issue with the return of 'kmdiff infos' and the content of ./kmdiff_backtrace.log
Thus I am not sure I can rely on this trick to get passed the issue when I get passed the SIGSEGV
.