Demuxing dual-indexed, single-end fastq files with indexes in header fails
I'm trying this tool for the first time to demultiplex a fastq file Undetermined_L001_R1.fastq.gz from an illumina run that was done using dual-indexes, and single-end reads. I have been getting a Segmentation fault as error for a number of things when running this tool.
So I decided to test this tool using the scripts and data in the test folder in this repository. Whenever I use the parameter -H to specify that the indexes are in the header of the reads, the tool always fails if I supply only one fastq file as input. however, when I supply two fastq files as input (entering the same file twice), the program runs without issues. My lines of code are below, with an excerpt of the data.
This fails:
$ fastq-multx -H -B indexes.txt test.fastq -o test_%.fastq
Using Barcode File: indexes.txt
Segmentation fault (core dumped)
but this runs (using the input file twice, and suppressing one of the ourputs, sort of a work around):
$ fastq-multx -H -B indexes.txt test.fastq test.fastq -o test_%.fastq -o n/a
Using Barcode File: indexes.txt
End used: start
Id Count File(s)
S1 17800 test_S1.fastq
S2 32100 test_S2.fastq
S3 3900 test_S3.fastq
...
The files I am using look like this:
==> index.txt <==
S1 TTACCGAC-CGTATTCG
S2 TCGTCTGA-TCAAGGAC
S3 TTCCAGGT-AAGCACTG
S4 TACGGTCT-GCAATGGA
S5 AAGACCGT-CAATCGAC
==> test.fastq <==
@A00929:83:HL75TDRXX:1:2101:13919:1047 1:N:0:TTACCGAC+CGTATTCG
CATATTGATAGTTCGCACAGGTAG
+
FFFFFFFFFFFFFFFFFFFFFFFF
@A00929:83:HL75TDRXX:1:2101:14009:1047 1:N:0:TCGTCTGA+TCAAGGAC
GTGCGTATCTATCAAAAATGTATA
+
I installed fastq-multx using conda in Ubuntu 20.04
It's been a very long time since I've worked in C, but I looked at the code and have a mild hunch... Try duplicating the seq defline on the qual defline and see if you still get a segfault.
Oh wait. I assumed this was the same as the the other related issue about barcodes in headers... WRT dual index...
if (bcinheader) {
ignore=getline(&q, &ignore_st, fin[i]);
ignore=getline(&q, &ignore_st, fin[i]);
ignore=getline(&q, &ignore_st, fin[i]);
/// no dual barcode detection allowed
getbcfromheader(s, &ns);
printf("bc is %s\n", s);
} else {
The comment in the code seems to possibly suggest that dual barcodes in fastq headers may not be supported?
The tool works well and demultiplexes without issue if I just repeat the input twice (and the output as well), irrespective of dual barcodes. I think this is probably a bug in the code.
Yeah. I gathered that. I was just trying to hazard a guess as to why it fails in the case where it segfaults. If I had more time, I would try and figure it out and fix the bug.
The tool works well and demultiplexes without issue if I just repeat the input twice (and the output as well), irrespective of dual barcodes. I think this is probably a bug in the code.
Thank you very much, it helps me