RNA-Bloom icon indicating copy to clipboard operation
RNA-Bloom copied to clipboard

Feature Request: More verbose logging?

Open schorlton opened this issue 1 year ago • 1 comments

Please report

  • [x] version of RNA-Bloom with java -jar RNA-Bloom.jar -version
  • [x] version of java with java -version
  • [x] exact command used to run RNA-Bloom

Same software versions as https://github.com/bcgsc/RNA-Bloom/issues/46

Thanks for the amazing tool. For the following job, I suspect it failed because there wasn't enough input data, but the log message is fairly vague. Was it a corrupt FASTQ? Did I OOM? I don't know based on this message. Is it possible to make it a bit clearer on what exactly went wrong, so I can decide when I need to investigate further? I think there is a separate error message that is sometimes triggered when there isn't enough input data. Thanks!

rnabloom -outdir rnabloom_out -t 8 -long filtered.fastq -ntcard
--
ERROR:root:stdout: RNA-Bloom v2.0.0
args: [-outdir, rnabloom_out, -t, 8, -long, filtered.fastq, -ntcard]
name:   rnabloom
outdir: rnabloom_out
WARNING: Output directory does not exist!
Created output directory at `rnabloom_out`
K-mer counting with ntCard...
Running command: `ntcard -t 8 -k 25 -c 65535 -p rnabloom_out/rnabloom @rnabloom_out/rnabloom.ntcard.readslist.txt`...
Parsing histogram file `rnabloom_out/rnabloom_k25.hist`...
Unique k-mers (k=25):     448
Unique k-mers (k=25,c>1): 0
WARNING: 0 non-singleton (c>1) k-mers detected!
K-mer counting completed in 3.367s
Bloom filters          Memory (GB)
====================================
de Bruijn graph:       9.901123E-7
k-mer counting:        7.9208985E-6
====================================
Total:                 8.911011E-6
> Stage 1: Construct graph from reads (k=25)
Parsing `filtered.fastq`...
Parsed 4 sequences in 0.004s
DBG Bloom filter FPR:                 1.06 %
Counting Bloom filter FPR:            0.0241 %
> Stage 1 completed in 0.009s
> Stage 2: Correct long reads for "rnabloom"
Parsing `filtered.fastq`...
Corrected Read Lengths Sampling Distribution (n=4)
min	q1	med	q3	max
153	155	160	166	170
Parsed 4 sequences.
Kept:      4	(100.0 %)
Discarded: 0	(0.0 %)
Corrected reads in 0.224s
Extracting seed sequences...
strobemers: n=3, k=11, wMin=12, wMax=61, depth=3
Bloom filter FPR:	0.389 %
before: 4	after: 4 (100.0 %)
too short: 0
Extraction completed in 0.109s
> Stage 2 completed in 0.333s
> Stage 3: Assemble long reads for "rnabloom"
Overlapping sequences...
Parsed 0 overlap records in 0.0s
total reads:    4
- unique:      0	(0.0 %)
- multi-seg: 0
Unique reads extracted in 0.001s
ERROR: Error assembling long reads!

schorlton avatar Aug 08 '22 19:08 schorlton

Hi @schorlton,

One of our collaborators reported the same issue (not on Github) when testing RNA-Bloom with just 5 reads.

This error arose because the default min read depth -lrrd is 3 and there are no sequences meeting this criterion. I will make an update with a more descriptive message.

Thanks, Ka Ming

kmnip avatar Aug 08 '22 20:08 kmnip