gencore
gencore copied to clipboard
Testing gencore with a dummy fastq results in no clusters
Hey @sfchen,
I'm currently testing how gencore handles reads in different case scenarios. For example if gencore can rescue reads with UMI with sequencing errors and or having an N present in the UMI, how it deals with singletons and strand biases (compared to other UMI programs). I made 2 fastq files (read 1 and read 2), that contain 50bp reads (46 genome nt, and 4 UMI nt) that have 25 bp overlap between each read pair. I then used fastp to extract the reads and bwa to align the reads to chr8.
I ran the following command:
gencore -i nov.sorted.bam -o gencore/gencore.bam -r ref/chr8.fa -b NOV.bed -s 1 --umi_prefix = "UMI"
BED: NOV.bed.txt
Sorted Bam: nov.sorted.bam.txt
Gencore recognizes that reads are present, but fails to cluster them:
loading reference data: chr8: 146364022 bp
loaded 1 contigs
1 contigs in the bam file: chr8: 146364022 bp
----Before gencore processing: Total reads: 30 Total bases: 1380 Mapped reads: 30 (100.000000%) Mapped bases: 1380 (100.000000%) Bases mismatched with reference: 12 (0.869565%) Reads with mismatched bases: 8 (26.666667%) Total mapping clusters: 0 Mapping clusters with multiple fragments: 0 Total fragments: 0 Fragments with single-end reads: 0 Fragments with paired-end reads: 0 Duplication level histogram:
----After gencore processing: Total reads: 0 Total bases: 0 Mapped reads: 0 (nan%) Mapped bases: 0 (nan%) Bases mismatched with reference: 0 (nan%) Reads with mismatched bases: 0 (nan%) Total mapping clusters: 0 Mapping clusters with multiple fragments: 0 Total fragments: 0 Fragments with single-end reads: 0 Fragments with paired-end reads: 0 Duplication level histogram:
gencore -i nov.sorted.bam -o gencore/gencore.bam -r ref/chr8.fa -b NOV.bed -s 1 --umi_prefix = UMI gencore v0.13.0, time used: 5 seconds
I was wondering if you could comment on why there is no fragment clustering (in the presence of UMIs and overlap between the reads in the pair)?