fastq-multx icon indicating copy to clipboard operation
fastq-multx copied to clipboard

Demultiplexing from barcodes in headers

Open jenzopr opened this issue 6 years ago • 9 comments

Hi Joe,

I just run across a Segmentation fault. error, when demultiplexing from barcodes in the header. However, all %.fastq.gz files are created as empty files, so the error must occur afterwards. My call is fastq-multx -H -m1 -B barcodes.txt input.fastq.gz -o %.fastq.gz and a fastq header line looks like:

@NS500475:199:HHML2BGX2:1:11101:21358:1116 2:N:0:1 AACCAATCGT
GCGGTTAAGAGTACTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
AAAA/EEEAEEEEEEEE###############################################################

Maybe you can provide me with some directions on how demultiplexing from a barcode in the header is possible. Best and many thanks! Jens

jenzopr avatar Mar 29 '18 08:03 jenzopr

I investigated a bit and have an example to reproduce the error. It has nothing to do with barcodes in headers primarily, but with handling of unmatched barcodes in case of single-end:

The content of input.fastq:

@NS500475:199:HHML2BGX2:1:11101:12492:1053 1:N:0:1
TTGCAAGATCC
+
AAAAA#EEEEE
@NS500475:199:HHML2BGX2:1:11101:26088:1053 1:N:0:1
TCCAANCATCT
+
AAAAA#EEEEE
@NS500475:199:HHML2BGX2:1:11101:17308:1053 1:N:0:1
ATGCCNTAATT
+
6AAAA#EEAAA
@NS500475:199:HHML2BGX2:1:11101:23038:1053 1:N:0:1
TGCGTNGGCCG
+
AAAAA#EEEEE
@NS500475:199:HHML2BGX2:1:11101:6451:1053 1:N:0:1
TTGCANGCAT
+
AAAAA#AAEE

The barcodes.txt file:

cell1	TTGCAGTCTAC
cell2	TTGCAGTTATG
cell3	TTGCCTATGGC
cell4	TTGCAGCGTCC
cell5	TTGCAGGCATC
cell6	TTGATTGCTCG
cell7	TTGATGCAATC
cell8	TTGATTCTTAA
cell9	TTGATTCAGAT
cell10	TTGCAAGATCC

The call gives me:

fastq-multx -D -m 1 -B barcodes.txt input.fastq -o %.fastq.gz
BC: 0 bc:TTGCAGTCTAC n:11
BC: 1 bc:TTGCAGTTATG n:11
BC: 2 bc:TTGCCTATGGC n:11
BC: 3 bc:TTGCAGCGTCC n:11
BC: 4 bc:TTGCAGGCATC n:11
BC: 5 bc:TTGATTGCTCG n:11
BC: 6 bc:TTGATGCAATC n:11
BC: 7 bc:TTGATTCTTAA n:11
BC: 8 bc:TTGATTCAGAT n:11
BC: 9 bc:TTGCAAGATCC n:11
Using Barcode File: barcodes.txt
End used: start
id: @NS500475:199:HHML2BGX2:1:11101:12492:1053 1:N:0:1, seq: TTGCAAGATCC 11, found bc: 9 bc:TTGCAAGATCC n:11, bestd: 0, next_best: 3, best: 9 cell10
id: @NS500475:199:HHML2BGX2:1:11101:26088:1053 1:N:0:1, seq: TCCAANCATCT 11, best: 10 unmatched
Segmentation fault

The same error occurs with paired-end sequences when -o %_R1.fastq -o %_R2.fastq is used instead of -o n/a -o %.fastq. HTH, Jens

jenzopr avatar Mar 29 '18 14:03 jenzopr

Hi Joe,

do you think you'll be able to fix the bug in the next few weeks? 😃

Best, Jens

jenzopr avatar Apr 09 '18 13:04 jenzopr

No, this would be a weekend/off-hours project and those are pretty booked these days.

brwnj avatar Apr 09 '18 20:04 brwnj

Uh, that's bad news, but understandable. I haven't programmed C++ in a while, but I will try to have a look and dig around if you don't mind.

jenzopr avatar Apr 10 '18 07:04 jenzopr

I'll happily to review pull requests!

brwnj avatar Apr 10 '18 20:04 brwnj

was this ever fixed?

dlebron12 avatar Jan 12 '22 23:01 dlebron12

I don't believe anyone ever prodded into this further.

brwnj avatar Jan 13 '22 01:01 brwnj

I am also experiencing a similar issue when using the "-H" parameter for dual-indexes in the header. I always get Segmentation fault (core dumped)

rikrdo89 avatar Jul 11 '23 15:07 rikrdo89

I looked more into the issue, and as it has said before, it has nothing to do with the headers. The program cannot handle single-end reads. A way around this is to provide the input fastq file twice, and set n/a for one of the outputs, as follows:

fastq-multx -H -B indexes.txt mxtest-h_1.fastq mxtest-h_1.fastq -o %_1.fastq -o n/a

Hopefully some one will fix this issue at some point.

rikrdo89 avatar Jul 11 '23 20:07 rikrdo89