Lighter icon indicating copy to clipboard operation
Lighter copied to clipboard

Major behaviour change from 1.0.7 to 1.1.1

Open tseemann opened this issue 7 years ago • 4 comments

Today I upgraded from lighter 1.0.7 to 1.1.1 and I first noticed a problem when 1.1.1 was outputting different number of reads in the two output files, and then noticed it was also passing far fewer reads.

This is the command line:

lighter -od . -r R1.fq.gz -r R2.fq.gz -K 32 4000000 -t 72 -maxcor 2

This is the difference in read counts:

Files   R1.fq.gz
Reads   3747457  # original reads
Files   R2.fq.gz
Reads   3747457

Files   1.0.7-R1.cor.fq.gz
Reads   3747457  # none missing
Files   1.0.7-R2.cor.fq.gz
Reads   3747457

Files   1.1.1-R1.cor.fq.gz
Reads   2511489  # lots missing
Files   1.1.1-R2.cor.fq.gz
Reads   2511506  # has 17 more reads!

Any ideas?

tseemann avatar Aug 16 '16 01:08 tseemann

I tested again on my data sets and could not trigger the bug you met. Is there a way for me to access the data set you use? If not, can you show me the summary of correction on screen output by Lighter? Thanks.

mourisl avatar Aug 16 '16 02:08 mourisl

I found the issue. If you compile with default -O2 option it works. In Linuxbrew, I used the system CXXFLAGS which sets -Os (size optimize), which causes the bug!
CC: @sjackman

See the output messages below:

Files   R1.fq.gz
Reads   3747457

This is g++ -O2 (which works)

$ ./lighter-1.1.1-O2 -od 1.1.1-O2 -r R1.fq.gz -r R2.fq.gz -K 32 4000000 -t 72 -maxcor 2
[2016-08-17 00:11:57] =============Start====================
[2016-08-17 00:11:57] Scanning the input files to infer alpha(sampling rate)
[2016-08-17 00:12:04] Average coverage is 141.346 and alpha is 0.050
[2016-08-17 00:12:05] Bad quality threshold is "B"
[2016-08-17 00:12:15] Finish sampling kmers
[2016-08-17 00:12:15] Bloom filter A's false positive rate: 0.006326
[2016-08-17 00:12:24] Finish storing trusted kmers
[2016-08-17 00:12:56] Finish error correction
Processed 7494914 reads:
        7042749 are error-free
        Corrected 579197 bases(1.280942 corrections for reads with errors)
        Trimmed 0 reads with average trimmed bases 0.000000
        Discard 0 reads

This is g++ -Os with missing reads!

$ ./lighter-1.1.1-Os -od 1.1.1-Os -r R1.fq.gz -r R2.fq.gz -K 32 4000000 -t 72 -maxcor 2
[2016-08-17 00:13:38] =============Start====================
[2016-08-17 00:13:38] Scanning the input files to infer alpha(sampling rate)
[2016-08-17 00:13:46] Average coverage is 141.346 and alpha is 0.050
[2016-08-17 00:13:47] Bad quality threshold is "B"
[2016-08-17 00:13:57] Finish sampling kmers
[2016-08-17 00:13:57] Bloom filter A's false positive rate: 0.006326
[2016-08-17 00:14:06] Finish storing trusted kmers
[2016-08-17 00:14:32] Finish error correction
Processed 5022995 reads:
        4719925 are error-free
        Corrected 388478 bases(1.281809 corrections for reads with errors)
        Trimmed 0 reads with average trimmed bases 0.000000
        Discard 0 reads

tseemann avatar Aug 16 '16 14:08 tseemann

Ping @mourisl - any ideas?

tseemann avatar Jul 01 '17 01:07 tseemann

As a workaround you can use ENV.O2 in the formula to use -O2 rather than the default -Os.

sjackman avatar Jul 05 '17 03:07 sjackman