gencore icon indicating copy to clipboard operation
gencore copied to clipboard

Testing gencore with a dummy fastq results in no clusters

Open ChadFibke opened this issue 4 years ago • 0 comments

Hey @sfchen,

I'm currently testing how gencore handles reads in different case scenarios. For example if gencore can rescue reads with UMI with sequencing errors and or having an N present in the UMI, how it deals with singletons and strand biases (compared to other UMI programs). I made 2 fastq files (read 1 and read 2), that contain 50bp reads (46 genome nt, and 4 UMI nt) that have 25 bp overlap between each read pair. I then used fastp to extract the reads and bwa to align the reads to chr8.

I ran the following command:

gencore -i nov.sorted.bam -o gencore/gencore.bam -r ref/chr8.fa -b NOV.bed -s 1 --umi_prefix = "UMI"

BED: NOV.bed.txt

Sorted Bam: nov.sorted.bam.txt

Gencore recognizes that reads are present, but fails to cluster them:

loading reference data: chr8: 146364022 bp

loaded 1 contigs

1 contigs in the bam file: chr8: 146364022 bp

----Before gencore processing: Total reads: 30 Total bases: 1380 Mapped reads: 30 (100.000000%) Mapped bases: 1380 (100.000000%) Bases mismatched with reference: 12 (0.869565%) Reads with mismatched bases: 8 (26.666667%) Total mapping clusters: 0 Mapping clusters with multiple fragments: 0 Total fragments: 0 Fragments with single-end reads: 0 Fragments with paired-end reads: 0 Duplication level histogram:

----After gencore processing: Total reads: 0 Total bases: 0 Mapped reads: 0 (nan%) Mapped bases: 0 (nan%) Bases mismatched with reference: 0 (nan%) Reads with mismatched bases: 0 (nan%) Total mapping clusters: 0 Mapping clusters with multiple fragments: 0 Total fragments: 0 Fragments with single-end reads: 0 Fragments with paired-end reads: 0 Duplication level histogram:

gencore -i nov.sorted.bam -o gencore/gencore.bam -r ref/chr8.fa -b NOV.bed -s 1 --umi_prefix = UMI gencore v0.13.0, time used: 5 seconds

I was wondering if you could comment on why there is no fragment clustering (in the presence of UMIs and overlap between the reads in the pair)?

ChadFibke avatar Mar 18 '20 16:03 ChadFibke