fastq-multx icon indicating copy to clipboard operation
fastq-multx copied to clipboard

Demuxing dual-indexed, single-end fastq files with indexes in header fails

Open rikrdo89 opened this issue 2 years ago • 9 comments

I'm trying this tool for the first time to demultiplex a fastq file Undetermined_L001_R1.fastq.gz from an illumina run that was done using dual-indexes, and single-end reads. I have been getting a Segmentation fault as error for a number of things when running this tool.

So I decided to test this tool using the scripts and data in the test folder in this repository. Whenever I use the parameter -H to specify that the indexes are in the header of the reads, the tool always fails if I supply only one fastq file as input. however, when I supply two fastq files as input (entering the same file twice), the program runs without issues. My lines of code are below, with an excerpt of the data.

This fails:

$ fastq-multx -H -B indexes.txt test.fastq -o test_%.fastq
Using Barcode File: indexes.txt
Segmentation fault (core dumped)

but this runs (using the input file twice, and suppressing one of the ourputs, sort of a work around):

$ fastq-multx -H -B indexes.txt test.fastq test.fastq -o  test_%.fastq -o n/a
Using Barcode File: indexes.txt
End used: start
Id      Count   File(s)
S1    17800     test_S1.fastq
S2    32100     test_S2.fastq
S3    3900      test_S3.fastq
...

The files I am using look like this:

==> index.txt <==
S1     TTACCGAC-CGTATTCG
S2     TCGTCTGA-TCAAGGAC
S3     TTCCAGGT-AAGCACTG
S4     TACGGTCT-GCAATGGA
S5     AAGACCGT-CAATCGAC

==> test.fastq <==
@A00929:83:HL75TDRXX:1:2101:13919:1047 1:N:0:TTACCGAC+CGTATTCG
CATATTGATAGTTCGCACAGGTAG
+
FFFFFFFFFFFFFFFFFFFFFFFF
@A00929:83:HL75TDRXX:1:2101:14009:1047 1:N:0:TCGTCTGA+TCAAGGAC
GTGCGTATCTATCAAAAATGTATA
+

I installed fastq-multx using conda in Ubuntu 20.04

rikrdo89 avatar Jul 12 '23 01:07 rikrdo89

It's been a very long time since I've worked in C, but I looked at the code and have a mild hunch... Try duplicating the seq defline on the qual defline and see if you still get a segfault.

hepcat72 avatar Jul 12 '23 15:07 hepcat72

Oh wait. I assumed this was the same as the the other related issue about barcodes in headers... WRT dual index...

            if (bcinheader) {
                ignore=getline(&q, &ignore_st, fin[i]);
                ignore=getline(&q, &ignore_st, fin[i]);
                ignore=getline(&q, &ignore_st, fin[i]);
                /// no dual barcode detection allowed
                getbcfromheader(s, &ns);
                printf("bc is %s\n", s);
            } else {

The comment in the code seems to possibly suggest that dual barcodes in fastq headers may not be supported?

hepcat72 avatar Jul 12 '23 15:07 hepcat72

The tool works well and demultiplexes without issue if I just repeat the input twice (and the output as well), irrespective of dual barcodes. I think this is probably a bug in the code.

rikrdo89 avatar Jul 12 '23 16:07 rikrdo89

Yeah. I gathered that. I was just trying to hazard a guess as to why it fails in the case where it segfaults. If I had more time, I would try and figure it out and fix the bug.

hepcat72 avatar Jul 12 '23 17:07 hepcat72

The tool works well and demultiplexes without issue if I just repeat the input twice (and the output as well), irrespective of dual barcodes. I think this is probably a bug in the code.

Thank you very much, it helps me

duao42 avatar Jul 24 '24 04:07 duao42