ATACseqQC icon indicating copy to clipboard operation
ATACseqQC copied to clipboard

Output bam files become smaller after shiftGAlignmentsList

Open Rui-Jing opened this issue 2 years ago • 4 comments

Hi, Thanks for developing this user-friendly tool for ATAC-seq. I found some problems after running shiftGAlignmentsList, and the output BAM file is about half the size of the input file. I don't know if this is a normal situation or if ATACseqQC does some filtering.

My code :

library(ATACseqQC)
bamfile <- '/home/zfzf/atac_ljh/alignment/neg2_701504_sorted.bam'
bamfile.labels <- gsub(".bam", "", basename(bamfile))
#bamQC(bamfile, outPath=NULL)
estimateLibComplexity(readsDupFreq(bamfile))
## generate fragement size distribution
fragSize <- fragSizeDist(bamfile, bamfile.labels)


## bamfile tags to be read in
possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", 
                                "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",
                                "TC", "UQ"), 
                    "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",
                                  "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",
                                  "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",
                                  "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",
                                  "U2"))

library(Rsamtools)

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
                     param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags

## files will be output into outPath
outPath <- "align_adjusted"
dir.create(outPath)

## shift the coordinates of 5'ends of alignments in the bam file
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

Rui-Jing avatar Jun 09 '22 02:06 Rui-Jing

Hi Rui-Jing,

Thank you for trying ATACseqQC to analyze your ATAC-seq data. Could you let me know the size of file by running the following code?

samtools view -b -f 2 -F 2816 neg2_701504_sorted.bam > clean.reads.bam

Thank you.

Jianhong.

From: Rui-Jing @.> Date: Wednesday, June 8, 2022 at 10:42 PM To: jianhong/ATACseqQC @.> Cc: Subscribed @.***> Subject: [jianhong/ATACseqQC] Output bam files become smaller after shiftGAlignmentsList (Issue #51)

Hi, Thanks for developing this user-friendly tool for ATAC-seq. I found some problems after running shiftGAlignmentsList, and the output BAM file is about half the size of the input file. I don't know if this is a normal situation or if ATACseqQC does some filtering.

My code :

library(ATACseqQC)

bamfile <- '/home/zfzf/atac_ljh/alignment/neg2_701504_sorted.bam'

bamfile.labels <- gsub(".bam", "", basename(bamfile))

#bamQC(bamfile, outPath=NULL)

estimateLibComplexity(readsDupFreq(bamfile))

generate fragement size distribution

fragSize <- fragSizeDist(bamfile, bamfile.labels)

bamfile tags to be read in

possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2",

                            "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",

                            "TC", "UQ"),

                "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",

                              "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",

                              "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",

                              "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",

                              "U2"))

library(Rsamtools)

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),

                 param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag

tags <- names(bamTop100)[lengths(bamTop100)>0]

tags

files will be output into outPath

outPath <- "align_adjusted"

dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE)

shiftedBamfile <- file.path(outPath, "shifted.bam")

gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/ATACseqQC/issues/51, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEAZ3LMXRPFKLU7VKBDDVOFKYTANCNFSM5YIPMRXA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jianhong avatar Jun 09 '22 12:06 jianhong

Thanks for your quikly reply. The input bam size is 4810847KB, and the output bam size is 2120454KB. So, I wonder if ATACseqQC has done some filtering?

Rui-Jing avatar Jun 09 '22 13:06 Rui-Jing

You may want to ask help by ?readBamFile There is flag filters when you read bam file by ATACseqQC. You can change the flags to meet your design.

Jianhong.

From: Rui-Jing @.> Date: Thursday, June 9, 2022 at 9:26 AM To: jianhong/ATACseqQC @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/ATACseqQC] Output bam files become smaller after shiftGAlignmentsList (Issue #51)

Thanks for your quikly reply. The input bam size is 4810847KB, and the output bam size is 2120454KB. So, I wonder if ATACseqQC has done some filtering?

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/ATACseqQC/issues/51#issuecomment-1151118336, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA2CLQ2EUJVEPYNOZT3VOHWJFANCNFSM5YIPMRXA. You are receiving this because you commented.Message ID: @.***>

jianhong avatar Jun 09 '22 13:06 jianhong

Great, thanks for the feedback !

Rui-Jing avatar Jun 09 '22 13:06 Rui-Jing